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Abstract 

Oiir  goal  under  this  three-year  contract  was  to  build  an  adaptive  planner  for  a  real¬ 
time,  imcertain  environment.  The  domain  we  chose  was  forest  fire  fighting,  for 
which  we  built  a  simulator  of  forest  fires  and  autonomous  agents  tasked  to  control 
them.  In  the  first  year  of  the  contract  we  built  a  flexible  agent  architecture  with  a 
variety  of  adaptive  mechanisms  that  makes  Phoenix  agents  responsive  to  the 
changing  demands  of  the  task  environment.  We  demonstrated  that  our  multi-agent 
planning  system  built  on  this  architecture  can  successfully  fight  simulated  fires 
under  a  variety  of  circumstances. 

In  the  second  year  of  the  contract  we  have  focused  on  why  our  planning  system 
works,  ana  whether  it  works  well.  Our  ongoing  inquiry  into  the  proper  role  of 
evaluation  in  AI  system  building  led  us  to  the  development  of  a  new  approach  to  AI 
research  —  modeling  and  anal3^ing  the  relationship  between  the  task  environment 
and  the  agent  design.  Modehng  AI  architectures  mathematically  is  a  promising 
innovation  that  should  provide  the  basis  for  much-needed  evaluation  and  analysis. 
Building  agents  that  "work"  is  not  enough.  To  prove  that  we  understand  how  they 
work,  we  must  model  them  precisely  enough  to  identify  and  correct  the  inevitable 
design  inefficiencies.  During  the  second  year  we  apphed  this  approach  to  our  work  in 
Phoenix,  developing  models  of  the  fire-fighting  task  environment  and  Phoenix  agent 
design  that  enabled  us  to  form  and  test  h3T)otheses  about  how  our  agents  perform. 

In  this  Final  Technical  Report  we  summarize  results  from  the  first  two  years,  during 
which  we  1)  built  an  adaptive  planning  architecture  for  a  complex,  real-time  task 
environment  and  a  testbed  for  its  principled  analysis;  and  2)  developed  a  model-based 
methodological  approach  and  used  it  to  analyze  numerous  aspects  of  the  Phoenix 
agent  architecture.  We  then  describe  the  five  culminating  accomplishments  of  our 
research  in  the  third  and  final  year: 

1.  development  of  a  procedure  we  call  failure  recovery  analysis  (FRA),  for 
analyzing  execution  traces  of  failure  recovery  to  discover  when  and  how  the 
planner’s  actions  may  be  causing  failures; 

2.  extending  our  previous  work  with  envelopes  with  the  development  of  a  simple 
one-parameter  decision  rule  called  a  slack  time  envelope; 

3.  taking  several  steps  toward  a  formalizing  of  the  problem  of  plan  execution 
monitoring; 

4.  building  causal  models  of  AI  program  behavior  using  path  analysis  ; 

5.  expanding  the  scope  of  our  methodological  approach  and  authoring  a  textbook 
on  empirical  methods  for  Artificial  Intelligence. 
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1.  Numerical  Productivity  Measures 

Refereed  papers  published:  16 
Refereed  papers  submitted;  3 
Invited  papers  published:  1 

Refereed  workshop  abstracts  and  symposia  papers;  8 

Books  or  parts  thereof  published:  3 

Ph.D.  dissertations:  1 

Unrefereed  reports  and  articles:  10 

Invited  presentations:  20 

Contributed  presentations:  13 

Tutorials;  2 

Honors,  including  conference  committees:  8 
Graduate  students  supported  at  least  25%  time:  8 
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2.  Executive  Summary 

2.1.  Summary  of  Technical  Results 

2.1. L  Accomplishments  in  the  First  Two  Contract  Years 

Our  original  goad  under  this  contract  was  to  build  a  real-time,  adaptive  planner, 
based  on  an  agent  architecture  capable  of  integrating  multiple  planning  methods. 

The  problem  domain  we  chose  was  forest  fire  fighting.  We  built  a  simulator  of  forest 
fires  and  autonomous  agents  tasked  to  control  them.  This  system,  which  we  called 
Phoenix,  consists  of  an  instrumented  discrete  event  simulation,  an  architectural 
shell  for  autonomous  agents  that  integrates  multiple  planning  methods,  and  an 
organization  of  planning  agents  capable  of  improving  their  fire-fighting  performance 
by  adapting  to  the  simulated  environment. 

Our  original  approach  to  this  large  research  endeavor  was,  1)  to  build  a  realistic 
simulated  world,  2)  build  simulated  autonomous  agents  to  solve  problems  in  the 
world,  and  3)  conduct  experiments  designed  to  demonstrate  that  our  solution 
"worked".  Thus,  in  the  first  year  of  this  contract  we  designed  a  flexible  agent 
architecture  with  a  variety  of  mechanisms  supporting  the  following:  delayed 
commitment  of  resources  in  the  face  of  a  d3Tiamic  environment;  real-time  control  of 
cognitive  processes;  sophisticated  monitoring  of  change  and  progress;  the  ability  to 
react  very  quickly  (reflexively)  to  sudden  changes  in  the  world;  and  learning  (Cohen  et 
al.  1989).  However,  during  this  first  year  our  ongoing  inquiry  into  the  proper  role  of 
evaluation  m  AI  system  building  (Cohen  1991b)  led  us  to  the  development  of  a  new’ 
approach  to  AI  research.  We  still  believe  in  the  importance  of  steps  1  and  2  above,  but 
would  substitute  the  following  steps  for  the  third:  3)  analyze  the  task  environment 
and  the  design  of  the  agents  by  modeling  them  mathematically;  4)  use  the  models  to 
predict  the  performance  of  proposed  designs;  5)  verify  predicted  performance  and 
identify  design  optimizations  empirically;  6)  implement  the  optimized  designs  and 
test  them. 

Modeling  AI  arcliitectures  mathematically  Is  a  promising  innovation  that  should 
provide  the  basis  for  much-needed  evaluation  and  analysis  (Cohen  1991b).  Building 
agents  that  "work"  is  not  enough.  To  prove  that  we  understand  how  they  work,  we 
must  model  them  precisely  enough  to  identify  and  correct  the  inevitable  design 
inefficiencies.  In  the  second  year  of  the  contract  we  applied  this  approach  to  our  work 
in  Phoenix,  developing  models  of  the  Phoenix  task  environment  and  agent  design  that 
enable  us  to  form  and  test  hypotheses  about  how  our  agents  perform.  A  sampling  of 
these  modeling  efforts  includes; 

•  Refining  our  view  of  modeling  and  the  critical  role  we  feel  it  plays  in  putting 
AI  on  a  firm  scientific  footing. 
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•  Developing  cost  n.oiels  for  recovery  from  plan  failures  (Howe  &  Cohen  1991), 
along  with  an  h  „ompanying  analysis  of  several  problems  that  arose  from 
this  cost  mr  '.i. 

•  Developing  models  of  wind  d\Tiamics  and  their  effect  on  fire  spread  (Hansen 
1990a,  Hansen  1990b).  Models  such  as  these  capture  the  kinds  of 
environmental  constraints  imposed  on  agents  operating  in  this  domain. 

•  Developing  several  related  models  of  optimal  fire-fighting  strategies  in 
Phoenix.  These  models  generated  hj^jotheses  that  we  subsequently  tested  m 
large  empirical  trials  (Cohen,  Hart  &  Devadoss  1990). 

•  Developing  an  anedysis  of  the  utility  of  envelopes  in  Phoenix  by  examimng  the 
time  inter\'al  between  monitoring  events  (Anderson  &  Hart  1990). 

•  Analj-zing  an  abstracted  model  of  the  cognitive  scheduling  problem  for  the 
fireboss  agent  in  Phoenix  (Anderson,  Hart  &  Cohen  1991). 

These  are  described  in  detail  in  the  1992  Annual  Technical  Report. 

2.1.2.  Accomplishments  in  the  Final  Year 

Our  work  in  the  third  (final)  year  of  this  contract  produced  five  important 
accomplishments: 

1.  developing  a  procedure  we  caU  failure  recovery  analysis  (FRA),  for  analyzing 
execution  traces  of  failure  recovery  to  discover  when  and  how  the  planner  s 
actions  may  be  causing  failures; 

2.  extending  our  previous  work  wnth  envelopes  w'ith  the  development  of  a  simple 
one-parameter  decision  rule  called  a  slack  time  envelope; 

.3.  taking  several  steps  toward  a  formalizing  of  the  problem  of  plan  execution 
monitoring; 

4.  building  causal  models  of  .AJ  program  beharior  using  path  analysis  ; 

5.  expanding  the  scope  of  our  methodological  approach  and  authoring  a  textbook 
on  empirical  methods  for  Artificial  Intelligence. 

Each  of  these  accomplishments  is  discussed  in  turn  in  Sections  2.1.3  -  2.1.7. 

2.1.3.  Failure  Recover'.- Analysis 

The  third  year  of  the  contract  saw  the  completion  of  .-\dele  Howe  s  thesis  on  failure 
recovery  in  Phoenix,  in  whicli  she  developed  a  new  approach  to  debuzmntr  -VI 
planning  systems  (which  also  can  be  extended  for  debugging  other  large  .-M  r^vstems 
as  well).  The  development  of  this  approach  can  be  traced  through  .manv  of  the  oaoers 
sponsored  by  this  contract,  including  Howe  &  Cohen  (1990).  Howe  i  Cohen  ( 1991  . 
Howe  (1992),  and  Howe  (1993).  Several  articles  have  been  recentlv  submitted  cn  th.is 
subject,  including  one  to  the  Artificial  Intelligence  Journal  .about  evaluating  .\I 
planner  behavior,  and  to  IEEE  Transactions  on  Knowledge  and  Data  Kngin,  <  r:  •::: 
.about  generalizing  this  approach  to  debugging  failur(-s  in  l.irge  soi’.'.vare  -v '‘.•■.u'. 

Ibis  approach  is  lirieily  fiutimed  ;n  this  si-ction,  and  more  fullv  exr'ia;ne(i  :r.  '.liai  .; 

PI. ins  fail  for  pm-fectly  good  re.-ison---:  the  environment  cli.inges  i;nprrii;c'..;bi;.', 
return  fiaky  data,  and  eiieclors  (io  not  work  as  expected.  During  ;  inner  de'.  --.'  :  ni'  , 
plans  fail  for  not  so  good  reasons:  tile  effects  of  actions  .-ire  not  aduajuately  .'•pm  d. 
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apparently  unrelated  actions  interact,  and  the  domain  model  is  incomplete  and 
incorrect.  Planners  should  not  cause  their  own  failures,  but  figuring  out  what  went 
wrong  and  preventing  it  later  is  not  easy.  Failures  tell  us  what  went  wrong,  but  not 
why.  The  failure  repair  alleviates  the  immediate  problem,  but  does  not  tell  us  how  to 
fix  the  cause  or  even  whether  the  repair  itself  might  not  cause  faulures  later.  We  have 
developed  a  procedure,  caUed  failure  recovery  analysis  (FRA),  for  analyzing  execution 
traces  of  failure  recovery  to  discover  when  and  how  the  planner’s  actions  may  be 
causing  failures  (Howe  1993). 

Most  approaches  to  debugging  planners  are  knowledge  intensive,  assuming  that  the 
planner  or  debugger  has  a  strong  model  of  the  domain.  The  approach  we  have 
developed,  FRA,  requires  little  knowledge  to  identify  contributors  to  failures  and  only  a 
weak  model  to  explain  how  the  planner  might  have  caused  failures.  Complementary 
to  the  more  knowledge  intensive  approaches,  this  approach  is  most  appropriate  when 
a  rich  domain  model  is  not  available  or  when  the  existing  model  might  be  incorrect  or 
buggy,  as  when  the  system  is  under  development. 

The  consequence  of  relying  on  a  weak  model  is  that  while  FRA  can  detect  possible 
causes  of  the  failure,  it  cannot  identify  the  cause  precisely  enough  to  implement  a 
repair.  Debugging  a  planner  requires  judgment  about  what  would  be  the  best 
modification  and  whether  the  failure  is  worth  avoiding  at  all.  In  repairing  one 
failure,  others  might  be  introduced.  In  FRA,  the  designer  decides  how  best  to  repair 
the  failures. 

Failure  recovery  analysis  involves  four  steps: 

1.  execution  traces  are  searched  for  statistically  significant  dependencies  between 
recovery  efforts  and  subsequent  failures; 

2.  dependencies  are  mapped  to  structures  in  the  planner’s  knowledge  base  kno\\-n 
to  be  susceptible  to  failure; 

3  the  interactions  and  vailnerable  plan  structures  are  used  to  generate 
explanations  of  why  the  failures  occur; 

4.  the  explanations  sen.'e  to  separate  occasional,  acceptable  failures  from  chronic, 
unacceptable  failures,  and  recommend  redesigns  of  the  planner  and  recovery- 
component. 

These  steps  are  described  in  detail  and  examples  of  their  use  in  Phoenix  given  in 
Section  3. 

Analyses  of  failure  recovery  can  contribute  in  several  ways  to  our  understanding  of 
planner  performance,  FRA  can  identify  contributors  to  failure  and  assist  in  the 
debugging  and  evaluation  of  planners  with  incomplete  or  incorrect  domain  mi  vie'-s 
Additionally,  the  dependencies  provide  a  measure  of  similarity  between  test 
situations.  The  more  tlie  environment  and  agent  change,  the  more  one  expects 
oliserved  effects  to  change;  thus,  dependencies  can  be  a  kind  of  similarity  mea-'iire 
acro.ss  planners  and  environments. 
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The  lesson  from  this  analysis  is  that  while  design  changes  rarely  have  isolated 
effects,  designers  do  not  have  to  give  up  hope  of  analyzing  the  effects.  They  can  track 
the  effects:  They  make  minor  changes  and  havoc  ensues,  but  they  have  a  way  to 
assess  the  havoc.  Phoenix  is  an  example  of  a  system  that  can  interleave  plans  in 
arbitrary  ways,  as  dictated  by  situation.  Debugging  its  failures  by  "watching  the 
system"  or  by  predicting  all  possible  execution  traces  is  simply  not  feasible,  but 
running  Phoenix  many  times  and  analyzing  the  data  is  feasible.  Failure  recovery 
analysis  isolates  indirect  effects  of  design  changes  and  proposes  explanations  and 
modifications  based  on  a  weak  model  of  the  planner  and  its  en\dronment;  its  primary 
contribution  is  in  helping  us  understand  how  planning  decisions  and  actions 
interact,  and  assisting  in  debugging  planners  under  development. 

2.1.4.  Slack  time  Envelop>es 

During  the  third  contract  year  we  extended  our  work  on  the  real-time  monitoring  and 
control  structure  we  call  envelopes.  We  showed  previously  that  envelopes  could  be 
applied  in  Phoenix  to  compare  the  actual  progress  of  plans  to  the  expected  progress  on 
which  the  plans  were  based  (Hart  et  al.,  1990).  Such  a  comparison  can  be  used  to 
predict  plan  failure  in  advance,  thus  allowing  the  planning  system  a  head  start  in 
responding  to  failure.  More  recently  we  have  shown  that  good  performance  can  be 
achieved  by  hand  constructed  slack  time  envelopes  (Cohen,  St.  Amant  &  Hart  1992), 
and  we  presented  a  probabilistic  model  of  progress,  froi  which  we  derived  a  method 
for  automatically  constructing  slack  time  envelopes  thf  balances  the  benefits  of  early 
warning  that  a  plan  is  failing  against  the  costs  of  false  positives  i  erroneous 
predictions  that  a  plan  is  failing  caused  by  uncertainty  in  our  predictions  or 
serendipitous  events). 

Underlying  the  judgment  that  a  plan  will  not  succeed  is  a  fundamental  tradeoff 
between  the  cost  of  an  incorrect  decision  and  the  cost  of  e%idence  that  might  improve 
the  decision.  For  concreteness,  let’s  say  a  plan  succeeds  if  a  vehicle  arrives  at  its  des¬ 
tination  by  a  deadline,  and  fails  otherwise.  At  any  point  in  a  plan  we  can  correctly  or 
incorrectly  predict  that  the  plan  will  succeed  or  fail.  If  we  predict  early  in  the  plan 
that  it  will  fail,  and  it  eventually  fails,  then  we  have  a  hit,  but  if  the  plan  eventually 
succeeds  we  have  a  false  positive.  False  positives  might  be  expensive  if  they  lead  to 
replanning.  In  general,  the  false  positive  rate  decreases  over  time  (e.g.,  verv'  few  pre¬ 
dictions  made  immediately  before  the  deadline  will  be  false  positives)  but  the  reduction 
in  false  positives  must  be  balanced  against  the  cost  of  waiting  to  detect  failures.  Ide¬ 
ally,  we  want  to  accurately  predict  failures  as  early  as  possible;  in  practice,  we  can 
have  accuracy  or  early  warnings  but  not  both. 

The  false  positive  rate  for  a  decision  rule  that  at  time  t  predicts  failure  will  geiier.illy 
decrea.se  as  t  increases.  In  work  detailed  in  .Section  i.  we  analyze  this  tradeolf  in 
several  ways.  Fir.<t,  we  de.scrihe  a  wry  simple  decision  rule,  callefi  .i  shu'i: 
envelope,  that  we  have  used  for  years  in  the  Plioeni-V  planner.  Then,  using  einiui  ic.il 
data  from  I^hoenu:,  we  e\'aluate  the  false  positive  rate  for  envelor;e.s  and  sho\^■  th.it 
envelopes  can  maintain  good  performance  throughout  a  pi. in.  .‘vn  inlinite  nuinb(.'r  nt 
slack  time  envelopes  can  Le  constructed  for  any  plan,  and  the  lir-t  .inalvsis  in  tlu' 
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paper  depends  on  "good"  envelopes  constructed  by  hand.  To  be  generally  useful, 
envelopes  should  be  constructed  automatically.  This  requires  a  formal  model  of  the 
tradeoff  between  when  a  failure  is  predicted  (earlier  is  better)  and  the  false  positive 
rate  of  the  prediction,  sho\vn  next  in  Section  4.  Finally  we  show  how  the  conditional 
probability  of  a  plan  failure  given  the  state  of  the  plan  can  be  used  to  construct 
"warning"  envelopes. 

Although  we  rely  heavily  on  slack  time  envelopes  in  the  Phoenix  planner,  we  have 
always  constructed  them  by  heuristic  criteria,  and  we  did  not  know  how  to  evaluate 
their  performance.  In  this  work  we  showed  that  good  performance  can  be  achieved  by 
hand-constructed  slack  time  envelopes,  and  we  presented  a  probabilistic  model  of 
progress,  from  which  we  derived  a  method  for  automatically  constructing  slack  time 
envelopes  that  balances  the  benefits  of  early  warnings  against  the  costs  of  false  posi¬ 
tives.  Our  contribution  has  been  to  cast  the  problem  in  probabilistic  terms  and  to 
develop  a  framework  for  evaluation.  We  are  presently  extending  our  work  to  other 
models  of  progress  and  different,  more  complex  domains. 

2.1.5.  Timing  is  Everything;  A  Theoretical  Look  at  Plan  Execution  Monitoring 

The  value  of  actions  depends  on  their  timing.  Actions  that  interact  with  processes  are 
more  or  less  effective  depending  on  when  they  occur.  We  are  familiar  with  the  idea  of 
a  window  of  opportunity,  but  it  isn't  always  clear  how  to  recognize  such  a  window 
before  it  closes.  Sometimes  it  is  advantageous  to  monitor  processes  to  detect  windows. 
In  (Cohen  1991a)  we  describe  several  timing  problems;  some  require  monitoring  and 
some  don’t.  We  consider  two  monitoring  strategies,  a  periodic  strateg>’  for  monitoring 
for  fires  in  Phoenix,  and  a  monitoring  strategy  for  predicting  when  a  task  will  finish. 
We  have  proved  the  optimality  of  the  former,  and  "mature"  humans  engage  in  the 
latter,  although  it  isn't  necessarily  optimal.  We  also  describe  one  case  in  W’hich  the 
cost  functions  for  two  processes  are  combined.  Cohen  (1991a)  ends  by  suggesting  that 
the  large  and  somewhat  bewildering  range  of  timing  problems  might  be  described  by 
relatively  few  features. 

These  features  can  be  composed  to  form  what  we  think  is  a  preliminary  taxonomy  of 
monitoring  problems.  We  have  begun  to  establish  this  taxonomy,  and  have  set  about 
tackling  some  of  the  constituent  monitoring  problems.  Some  of  these  efforts  are 
described  in  this  section.  First  we  report  on  a  survey  of  the  literature  on  plan 
execution  monitoring.  Next  we  discuss  an  optimal  (though  expensive)  monitoring 
strategy  for  predicting  whether  a  task  will  meet  a  deadline  (the  envelope  problem). 

This  strategy  works  only  if  we  have  a  model  of  the  process  being  monitored.  In  the 
final  part  of  this  section  we  discuss  a  method  for  learning  an  optimal  monitoring 
strategy  for  tasks  with  deadlines  when  no  model  of  the  task  is  available. 

Monitoring  Plan  Execution;  A  Short  Survey.  In  Hansen  &  Cohen  \  1992c)  we  survey 
the  progress  that  has  been  made  on  the  problem  of  monitoring  plan  execution  in  the 
twenty  years  since  the  first,  simple  sciieme  was  developed  (Pikes  1971;  Pikes,  Hart  & 
Nilsson,  1972).  Although  the  research  team  that  built  the  Shakey  robot  gave  as  much 
attention  to  the  problem  of  execution  monitoring  as  they  did  to  the  problem  of  plan 
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generation,  for  more  than  a  decade  afterward  the  research  community  focused 
almost  exclusively  on  the  problem  of  plan  generation.  The  problem  of  execution 
monitoring  was  regarded,  at  best,  as  a  side-issue.  However  the  last  few  years  have 
seen  a  revived  interest  in  plan  execution  systems,  an  interest  spurred  by  the  desire  to 
build  agents  that  can  operate  effectively  in  complex,  changing  emnronments.  With 
this  in  mind,  it  seems  worthwhile  to  collect  together  in  one  place  pointers  to  the 
scattered  work  that  has  been  done  on  this  subject,  to  prowde  some  structure  to  it,  and 
to  analyze  the  issues  it  raises.  While  many  good  surveys  of  plan  generation  have  been 
written,  until  now  no  comparable  sur\'ey  has  been  made  of  the  work  that  has  been 
done  on  execution  monitoring,  ^ 

As  a  broad  characterization,  execution  monitoring  is  a  way  of  dealing  with  uncer¬ 
tainty  about  the  effect  of  executing  a  plan.  In  general,  there  are  two  reasons  for  moni¬ 
toring  plan  execution.  The  more  basic  one,  which  might  be  called  "monitoring  for 
failures",  simply  consists  of  checking  that  a  plan  works,  that  actions  have  their 
intended  effect.  The  second  might  be  called  "monitoring  for  opportunities".  It 
involves  checking  for  things  that  need  to  be  done,  for  events  that  need  to  be  responded 
to.  It  can  also  involve  checking  for  shortcuts  or  "optimizations"  to  a  plan  that  may 
become  available  as  the  plan  executes.  In  this  case,  the  question  is  not  w’hether  a  plan 
works  but  whether  it  could  be  made  better. 

This  survey  addresses  the  following  broad  questions: 

•  What  to  Monitor.  We  begin  by  considering  the  question  of  what  to  monitor.  An 
answer  is  to  monitor  what  is  relevant,  and  to  determine  what  is  relevant  by  using 
the  dependency  structure  of  the  plan. 

•  When  to  Monitor.  There  are  two  ways  in  which  the  question  of  when  and  how 
often  to  monitor  has  been  addressed.  One  is  by  analyzing  the  uncertainty  intro¬ 
duced  into  a  plan  by  an  agent's  owm  actions,  and  triggering  a  monitoring  action 
whenever  the  cumulative  uncertainty  exceeds  some  threshold.  The  other  is  by 
modeling  the  rate  of  change  of  the  process  in  the  emdronment  being  monitored, 
and  setting  the  monitoring  frequency  to  reflect  that  rate  of  chcinge. 

•  Monitoring  and  Sensing.  There  is  a  close  relationship  between  monitoring  and 
sensing,  so  much  so  that  it  can  seem  natural  to  identify  the  two,  to  say  they  are  one 
and  the  same.  However  in  a  number  of  schemes  for  execution  monitoring,  moni¬ 
toring  and  sensing  are  distinguished. 

•  Architectures  for  Monitoring.  After  investigating  the  question  of  what 
conditions  to  monitor,  as  well  as  when  and  how  often  to  monitor  them,  we  look  at 
the  relationship  between  monitoring  and  sensing.  The  latter  question  brin::s  us  to 
the  point  of  considering  how  schemes  for  monitoring  affect,  and  are  affected  by, 
the  design  of  an  agent  architecture,  a  subject  we  could  refer  to  as  how  to 
monitor". 

Monitoring  to  Predict  Wlicthcr  a  Task  Will  Meet  a  Deadline.  Let  u->  coa'id-r  t;.e 
problem  of  checking  a  decision  rule  that  predicts  whether  a  task  will  finish  by  a 


“  1  Ills  sur\'py.  to  ptx-.ir  i n  .\1  .Mrtpacine ,  ii;  part  (  f  F.nc  Hans<‘ri  s  Maxtor  ?  pro’ ••cl.  ( 'i,  V'ln.',  'a k  or. 
monitoring  is  Ijeing  supported  under  an  Auginen’.ation  Award  for  Science  and  Lngirai  n rv.;  la  v^.ircn 
Training. 
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deadline.  One  instance  of  this  is  envelopes,  and  is  implemented  as  part  of  our 
Phoenix  planner  (Hart,  Anderson  &  Cohen,  1990).  The  idea  of  using  a  decision  rule 
such  as  an  envelope  to  anticipate  failure  to  meet  a  deadline  soon  enough  ahead  of  time 
to  initiate  recovery  has  wide  application,  especially  for  real-time  computing.  AI 
systems  that  do  approximate  processing  under  time  pressure  monitor  progress  so 
that  they  can  adjust  their  processing  strategy  to  make  sure  they  generate  at  least  an 
approximate  solution  by  a  deadline  (Lesser,  Pavlin  &  Durfee,  1988).  Similarly, 
d5mamic  schedulers  for  real-time  operating  systems  monitor  task  execution  so  that 
they  can  anticipate  failure  to  meet  a  task  deadline  as  soon  as  possible  and  take 
appropriate  action  (Haben  &  Shin,  1990). 

Given  a  decision  rule  (such  as  an  envelope)  that  predicts  whether  or  not  a  deadline 
will  be  met,  it  remains  to  be  decided  how  often  this  rule  should  be  tested  (in  other 
words,  how  often  should  the  envelope  be  monitored).  If  monitoring  has  no  cost,  it  can 
be  tested  continuously.  But  if  it  has  a  cost,  there  must  be  a  scheduhng  policy  for  it. 
When  we  built  the  Phoenix  planner  we  assumed  a  periodic  strategy  for  monitoring. 
This  was  purely  ad  hoc;  we  monitored  the  envelope  to  check  performance  every’  fifteen 
minutes  of  simulated  time,  without  regard  for  the  cost  of  making  each  check.  This  is 
also  true  in  Haben  &  Shin's  dynamic  scheduling  system,  which  assumes  there  is  no 
cost  of  monitoring. 

To  solve  what  we  will  call  the  "envelope  monitoring  problem”  using  stochastic 
dynamic  programming,  we  first  express  it  in  formal  mathematical  terms.  Its  state  is 
represented  by  a  vector  of  two  variables;  1)  the  time  remaining  before  the  deadline, 
and  2)  the  distance  remaining  to  reach  the  goal  condition.  There  is  a  single  decision 
variable,  m;  the  decision  is  to  either  stop  and  abandon  the  task  or  to  continue 
executing  it  for  m  additional  units  of  time  before  monitoring  its  progress  again.  The 
complete  solution  is  given  in  Hanson  (1993).  The  obvious  qualitative  observation  to 
make  is  that  the  frequency  of  monitoring  increases  with  closeness  to  the  envelope 
boundary.  This  supports  the  intuition  that  the  more  likely  one  is  to  cross  the 
threshold  of  a  decision  rule,  the  more  often  one  should  check  (or  "monitor'  )  the  rule. 

We  are  currently  working  to  reconcile  this  optimal  but  costly  strategy  (time  complexity 
0(/j’)  and  space  complexity  0(/i-))  ^vith  the  one-parameter  slack-time  envelope 
boundary  estimator  discussed  in  Section  2.1.4.  Our  intuition  is  that  the  one- 
parameter  rule  can  serve  as  a  cheap  appro.ximation  of  the  optimal  strategy  at  times 
when  there  is  insufficient  time  to  compute  it. 

Learning  a  Decision  Rule  for  Monitoring  Tasks  w’ith  Deadlines.  In  the  preceding 
section  (and  in  Hanson  1993)  we  showed  that,  given  a  model  of  the  state  transition 
probabilities  and  payoffs,  an  optimal  monitoring  policy  can  be  determined  using 
stochastic  dynamic  programming.  In  Hanson  and  Cohen  (199'2a)  we  extend  this 
w’ork,  showing  that  even  without  a  model,  an  optimal  monitoring  policy  can  he 
learned. 
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A  real-time  scheduler  or  planner  is  responsible  for  managing  tasks  with  deadlines. 
When  the  time  required  to  execute  a  task  is  uncertain,  it  may  be  useful  to  monitor  the 
task  to  predict  whether  it  will  meet  its  deadline;  this  provides  an  opportunity  to  make 
adjustments  or  else  to  abandon  a  task  that  can't  succeed.  Hanson  (1993)  treats 
monitoring  a  task  with  a  deadline  as  a  sequential  decision  problem.  Given  an  explicit 
model  of  task  execution  time,  execution  cost,  and  payoff  for  meeting  a  deadline,  an 
optimal  decision  rule  for  monitoring  the  task  can  be  constructed  using  stochastic 
dvTiamic  programming.  If  a  model  is  not  available,  the  same  rule  can  be  learned 
using  temporal  difference  methods  (Barto,  Sutton  &  Watkins,  1990).  These  results  are 
significant  because  of  the  importance  of  this  decision  rule  in  real-time  computing. 

It  makes  sense  to  construct  a  decision  rule  such  as  the  one  described  in  Hanson  (1993) 
for  tasks  that  are  repeated  many  times  or  for  a  class  of  tasks  with  the  same  behavior. 
This  allows  the  rule  to  be  learned,  if  TD  methods  are  relied  on;  or  for  statistics  to  be 
gathered  to  characterize  a  probability  and  cost  model,  if  dynamic  programming  is 
relied  on.  However  if  a  model  is  known  beforehand,  or  can  be  estimated,  a  decision 
rule  can  also  be  constructed  for  a  task  that  executes  only  once. 

The  time  complexity  of  the  djmamic  programming  algorithm  is  O(n-),  where  n  is  the 
number  of  time  steps  from  the  start  of  the  task  to  its  deadline:  however  the  decision 
rule  may  be  compiled  once  and  reused  for  subsequent  tasks.  The  time  complexity  of 
TD  learning,  0(fi),  is  mitigated  by  the  possibility  of  turning  learning  off  and  on.  The 
space  overhead  of  representing  an  evaluation  function  by  a  table  is  avoidable  by  using 
a  more  compact  function  representation,  such  as  a  connectionist  network. 

Besides  the  fact  that  this  approach  is  not  computationally  intensive,  it  has  other 
advantages.  It  is  conceptually  simple.  The  decision  rule  it  constructs  is  optimal,  or 
converges  to  the  optimal  in  the  case  of  TD  learning.  It  works  no  matter  what  proba¬ 
bility  model  characterizes  the  execution  time  of  a  task  and  no  matter  what  cost  model 
applies,  and  so  is  extremely  general.  Finally,  it  works  even  when  no  model  of  the 
state  transition  probabilities  and  costs  is  available,  although  a  model  can  be  taken 
advantage  of. 

These  results  can  be  extended  in  two  obvious  ways.  The  finst  is  to  factor  in  a  cost  for 
monitoring.  Our  analysis  thus  far  has  assumed  that  monitoring  has  no  cost,  or  its 
cost  is  negligible.  This  allows  monitoring  to  be  nearly  continuous,  in  effect,  for  a  task 
to  be  monitored  each  time  step.  Others  who  have  developed  similar  decision  rules 
have  also  assumed  that  the  cost  of  monitoring  is  negligible.  However  in  some  cases 
the  cost  of  monitoring  may  be  significant,  thus  we  show  in  another  paper  how  this 
cost  can  be  factored  in  (Hansen  1993).  Once  again  we  use  cinnamic  programming  and 
TD  methods  to  develop  optimal  monitoring  strategies. 


The  second  wav  in  which  this  work  can  be  extended  is  to  make  the  decision  rule  more 
complicated.  Here  we  analyzed  a  simple  example  in  which  the  only  alternative  to 
continuing  a  task  is  to  abandon  it.  Hut  recoverv  options  may  be  available  as  well.  .A 
dynamic  scheduler  for  a  real-time  ofierating  .system  is  unlikely  to  have  recovery 
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options  available,  but  an  AI  planner  or  problem-solver  is  almost  certain  to  have  them 
(Lesser,  Pavlin  &  Durfee,  1988;  Howe,  1992).  The  way  to  handle  the  more  complicated 
decision  problem  this  poses  is  to  regard  each  recovery  option  as  a  separate  task  char¬ 
acterized  by  its  own  probability  model  and  cost  model;  so  at  any  point  the  expected 
value  of  the  option  can  be  computed.  Then,  instead  of  choosing  between  two  options, 
either  continuing  a  task  or  abandoning  it,  the  choice  includes  the  recovery  options  as 
well.  The  rule  is  simply  to  choose  the  option  with  the  highest  expected  value. 

The  work  described  in  this  and  the  preceding  subsection  constitutes  the  second  half  of 
Eric  Hansen's  Master's  project.  Section  5  presents  a  more  detailed  overview  of  this 
work. 

2.1.6.  Causal  Modeling  using  Path  Analysis 

During  the  third  contract  year  we  decided  to  baseline  the  real-time  performance  of  the 
Phoenix  Fireboss  to  help  us  design  real-time  scheduling  algorithms  for  its  cognitive 
activities.  We  undertook  the  experiment  described  in  Section  6  (Hart  &  Cohen  1992)  to 
measure  how  changes  in  the  Fireboss's  thinking  speed  affected  its  real-time 
performance.  To  anal3rze  the  results  we  used  a  statistical  modeling  technique  called 
path  analysis  that  we  feel  holds  great  promise  as  a  method  for  building  causal  models 
from  empirical  observations  of  AI  planner  behavior. 

It  is  difficult  to  predict  or  even  explain  the  behavior  of  any  but  the  simplest  AI  pro¬ 
grams.  A  program  will  solve  one  problem  readily,  but  make  a  complete  hash  of  an 
apparently  similar  problem.  For  example,  our  Phoenix  planner,  which  fights  simu¬ 
lated  forest  fires,  will  contain  one  fire  in  a  matter  of  hours  but  fail  to  contain  another 
under  very  similar  conditions.  We  therefore  hesitate  to  claim  that  the  Phoenix  plan¬ 
ner  "works."  The  claim  would  not  be  very  informative,  anj'way  —  we  would  much 
rather  be  able  to  predict  and  explain  Phoenix's  behavior  in  a  wide  range  of  conditions 
(Cohen  1991b).  In  Section  6  we  describe  an  experiment  with  Phoenix  in  which  we 
uncover  factors  that  affect  the  planner’s  behavior  and  test  predictions  about  the 
planner's  robustness  against  variations  in  some  factors  (Hart  &  Cohen  1992).  We  also 
introduce  a  technique — path  analysis — for  constructing  and  testing  causal 
explanations  of  the  planner’s  behavior.  Our  results  are  specific  to  the  Phoenix 
planner  and  will  not  necessarily  generalize  to  other  planners  or  en\’ironments,  but 
our  techniques  are  general  and  should  enable  others  to  derive  comparable  results  for 
themselves. 

We  designed  an  experiment  with  two  purposes.  A  confirmatory  purpose  was  to  test 
predictions  that  the  planner's  performance  is  sensitive  to  some  environmental  con¬ 
ditions  but  not  others.  In  particular,  we  expected  performance  to  degrade  when  we 
change  a  fundamental  relationship  between  the  planner  and  its  environment — the 
amount  of  time  the  planner  is  allowed  to  think  relative  to  the  rate  at  which  the  envi¬ 
ronment  changes — and  not  be  sensitive  to  common  dynamics  in  the  environment 
such  as  weather,  and  particularly,  wind  speed.  We  tested  two  specific  predictions:  1) 
that  performance  would  not  degrade  or  would  degrade  gracefully  as  wind  speed 
increased;  and  2)  that  the  planner  would  not  be  robust  to  changes  in  the  Fireboss's 
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thinking  speed  due  to  a  bottleneck  problem  described  below.  An  exploratory  purpose 
of  the  experiment  was  to  identify  the  factors  in  the  Fireboss  architecture  and  Phoenix 
emdronment  that  most  affected  the  planner's  behavior,  leading  to  a  causal  model  of 
the  time  required  to  put  out  a  fire. 

In  order  to  illustrate  the  usefulness  of  path  analysis  for  modeling  causal 
relationships,  it  is  necessary  to  delve  a  little  bit  into  the  workings  of  the  Phoenix 
planner.  The  Fireboss  must  select  plans,  instantiate  them,  dispatch  agents  and 
monitor  their  progress,  and  respond  to  plan  failures  as  the  fire  burns.  The  rate  at 
which  the  Fireboss  thinks  is  determined  by  a  parameter  called  the  Real  Time  Knob. 

By  adjusting  the  Real  Time  Knob  we  allow  more  or  less  simulation  time  to  elapse  per 
unit  CPU  time,  effectively  adjusting  the  speed  at  which  the  Fireboss  thinks  relative  to 
the  rate  at  which  the  environment  changes. 

The  Fireboss  services  bulldozer  requests  for  assignments,  pro\nding  each  bulldozer 
with  a  task  directive  for  each  new  fireline  segment  it  buUds.  The  Fireboss  can  become 
a  bottleneck  when  the  arrival  rate  of  bulldozer  task  requests  is  high  or  when  its  think¬ 
ing  speed  is  slowed  by  adjusting  the  Real  Time  Knob.  This  bottleneck  sometunes 
causes  the  overall  digging  rate  to  fall  below  that  required  to  complete  the  fireline 
polygon  before  the  fire  reaches  it,  which  causes  replanning.  In  the  worst  case,  a 
Fireboss  bottleneck  can  cause  a  thrashing  effect  in  which  plan  failures  occur  repeat¬ 
edly  because  the  Fireboss  can't  assign  bulldozers  during  replanning  fast  enough  to 
keep  the  overall  digging  rate  at  effective  levels.  We  designed  our  experiment  to 
explore  the  effects  of  this  bottleneck  on  system  performance  and  to  confirm  our  predic¬ 
tion  that  performance  would  vary  in  proportion  to  the  manipulation  of  thinking  speed. 
Because  the  current  design  of  the  Fireboss  is  not  sensitive  to  changes  in  thinking 
speed,  we  expect  it  to  take  longer  to  fight  fires  and  to  fail  more  often  to  contain  them  as 
thinking  speed  slows. 

In  contrast,  we  expect  Phoenix  to  be  able  to  fight  fires  at  different  wind  speeds.  It 
might  take  longer  and  sacrifice  more  area  burned  at  high  wind  speeds,  but  we  e.xpect 
this  effect  to  be  proportional  as  wind  speed  increases  and  we  expect  Phoenix  to  suc¬ 
ceed  equally  often  at  a  range  of  wind  speeds,  since  it  was  designed  to  do  so. 

In  Section  6  we  show  that  performance  did  indeed  degrade  as  w'e  .systematically 
slowed  Fireboss  thinking  speed.  Interestingly,  this  degradation  was  not  linear  (with 
respect  to  the  time  required  to  contain  the  fire).  We  tried  using  multiple  regression  to 
model  the  factors  that  determine  this  nonlinear  relationship,  but  found  that  while  we 
could  derive  a  predictive  model,  such  a  regression  model  doesn't  allow  us  to  explain 
the  inter-related  causal  influences  among  the  factors.  We  were  able  to  apply  path 
analysis  (Li  1975;  Asher  19S3)  to  build  a  model  that  is  both  predictive  and  explani-.tiar'.', 
and  which  tells  us  (among  other  things)  how  Phoenix  performance  will  be  aifcatea  :  y 
changes  in  the  amount  of  thinJting  time  available  to  the  Fireboss. 

Path  analysis  is  a  generalization  of  multiple  linear  regression  that  builds  im.dels  v.  ith 
causal  interpretations.  It  is  dn  exploratory  or  dir^covcry  procedure  for  finiiing  causal 
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structure  in  correlational  data.  In  the  months  since  this  contract  has  terminated  we 
have  continued  this  work,  applying  path  analysis  to  the  problem  of  building  models  of 
AI  programs,  which  are  generally  complex  and  poorly  understood.  Path  analysis  has 
a  huge  search  space,  however.  If  one  measures  N  parameters  of  a  system,  then  one 
can  build  0(2^^)  causal  models  relating  these  parameters.  For  this  reason,  we  are 
currently  developing  an  algorithm  that  heuristically  searches  the  space  of  causal 
models. 

2.1.7.  Continuing  Methodological  Development 

The  design  of  AI  systems  is  t5rpically  justified  informally.  For  example,  one  might  say, 
“The  planner  is  designed  to  be  reactive  because  the  environment  changes  rapidly  in 
unexpected  ways.”  We  believe  this  style  of  justification  is  too  informal  to  support  (a) 
demonstrations  of  the  necessity  of  a  design,  (b)  evaluation  of  the  design,  (c) 
generalization  of  the  design  to  other  tasks  and  environments,  (d)  communication  of 
the  design  to  other  researchers,  and  (e)  comparisons  between  d  gns.  Through  our 
research  program  we  seek  to  demonstrate  that  achieving  these  goals  is  a  natural 
consequence  of  basing  designs  on  formal  models  of  the  interactions  between  agents 
and  their  environments.  The  methodology  we  have  developed  for  this  purpose  we  caU 
modeling,  analysis  and  design  (MAD)  (Cohen  1991b). 

While  the  development  of  this  methodology  was  funded  under  other  contracts,  we 
have  consistently  appUed  it  to  our  work  in  Phoenix  (see  previous  Amaual  Technical 
Reports).  This  provides  us  with  a  rich  source  of  examples,  from  modeb  of  the  task 
environment  (Hansen  1990a,  Hansen  1990b,  Sdvey  1990)  to  models  of  the  Phoenix 
agent  architecture  (Cohen  1990,  Anderson  et  al.  1991,  Hart  &  Cohen  1992)  to  the 
design  of  experiments  to  evaluate  planning  system  behavior  (Howe  &  Cohen  1991). 
These  examples  have  in  turn  been  incorporated  into  presentations  of  our 
methodological  approach  in  numerous  forums,  from  conferences  (see  Conferences, 
Workshops  and  Presentations)  to  magazine  articles  (Cohen  1991b),  and  finally  to  a 
textbook  and  graduate  course  curriculum  on  AI  methodology  (Cohen  forthcoming). 
Some  examples  of  these  activities  include: 

•  Presentations .  Paul  Cohen  was  invited  to  deliver  keynote  addresses  on 
methodological  issues  at  a  conference  and  a  AAAI  Spring  Sjunposium  (see 
Invited  Presentations).  He  also  participated  in  the  Workshop  on  Research  in 
Experimental  Computer  Science,  the  goal  of  which  was  to  identify  issues  and 
problems  arising  in  experimental  work  in  the  entire  field  of  Computer  Science. 
Sponsored  by  ONR,  D.4RPA,  and  NSF,  this  workshop  was  held  in  Palo  Alto,  CA, 
October  16-18,  1991. 

•  Workshop  on  Al  Methodology .  Held  in  June  of  1991,  this  work.shop,  sponsored 
jointly  by  DARPA  and  NSF,  brought  together  leading  AI  researchers  to  dbcu:ss 
growing  methodological  concerns  and  develop  a  consensual  strategy’  for 
addressing  them. 

•  Agentology  Curriculum  .  During  the  summer  of  1991  we  conducted  a  sum¬ 
mer  school  designed  to  develop  the  skills  in  our  graduate  students  needed  to 
conduct  MAD  research,  and  believe  that  this  effort  has  laid  the  groundworl:  .  a  u 
curriculum  in  agentology  —  the  principled  design  of  autonomous  agents  for 
complex  environments.  From  that  summer  school  we  have  devel()ped  a 
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research  methods  course  for  AI  graduate  students  and  are  working  on  an 
accompanying  textbook  on  Experimental  Methods  for  AI  Research. 

•  AAAI  Tutorial  on  Experimental  Methods  for  Ad  Research.  This  tutorial  was 
offered  jointly  with  Prof.  Bruce  Porter  (Univ.  of  Texas,  Austin)  at  AAAI-92,  and 
will  be  offered  again  at  AAAI-93. 


A  Textbook  for  Empirical  Methods  in  Artificial  Intelligence.  The  acti\'ities  listed 
above  are  culminating  now  in  a  textbook  being  prepared  for  use  in  graduate  .AI 
methods  courses.  Entitled  "Empirical  Methods  for  Artificial  Intelligence,"  this 
textbook  is  a  primer  for  the  empirical  evaluation  of  the  new  generation  of  agents  being 
designed  by  AI  researchers.  A  prospectus  for  the  textbook  appears  in  Appendix  A. 
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2JL  Publications 

2^1,  Refereed  Papers  Published 

Anderson,  S.D.,  Hart,  D.M.  &  Cohen,  P.R.  Two  ways  to  act.  AAAI  Spring 

Symposium  on  Integrated  Intelligent  Architectures.  Published  in  the  SIGART 
Bulletin,  2(4):20-24. 1991. 

Cohen,  P.R  &  Hart,  D.M.  Path  analysis  models  of  an  autonomous  agent  in  a  complex 
environment.  To  appear  in  Proceedings  of  the  Fourth  International  Workshop  on 
AI  and  Statistics.  1993. 

Cohen,  P.R.,  St.  Amant,  R.  &  Hart,  D.M.  Early  warning  of  plan  failure,  false  posi¬ 
tives  and  envelopes:  Experiments  and  a  model.  Proceedings  of  the  Fourteenth 
Annual  Conference  of  the  Cognitive  Science  Society.  Lawrence  Erlbaum 
Associates,  Inc.  1992.  Pp.  773-778. 

Cohen,  P.R.  A  Survey  of  the  Eighth  National  Conference  on  Artificial  Intelligence: 
Pulling  together  or  pulling  apart?  AI  Magazine,  12(1),  16-41. 

Cohen,  P.R.  Designing  and  analyzing  strategies  for  Phoenix  from  models. 

Proceedings  of  the  Workshop  on  Innovative  Approaches  to  Planning,  Scheduling, 
and  Control,  Katia  Sycara  (Ed.).  Morgan-Kaufman,  1990.  Pp.  9-21. 

Cohen,  P.R.,  Greenberg,  M.L.,  Hart,  D.M.,  &  Howe,  A.E.  Real-Time  problem  soKdng 
in  the  phoenix  environment.  Proceedings  of  the  Workshop  on  Real-time  Artificial 
Intelligence  Problems  at  the  Eleventh  International  Joint  Conference  on  Artificial 
Intelligence,  Detroit,  Michigan,  1989. 

Cohen,  P.R.,  Greenberg,  M.L.,  Hart,  D.M.,  &  Howe,  A.E.  Trial  by  fire: 

Understanding  the  design  requirements  for  agents  in  complex  env-ironments. 
Reprinted  in  Nikkei  Artificial  Intelligence,  102-119,  Nikkei  Business  Publications, 
Inc.,  1990.  (Originally  published  in  AI  Magazine,  32—48,  Fall  1989.) 

Hart,  D.M.  &  Cohen,  P.R.  Predicting  and  explaining  success  and  task  duration  in  the 
Phoenix  planner.  Proceedings  of  the  First  International  Conference  on  AI 
Planning  Systems.  Morgan  Kaufmann.  1992.  Pp.  106-115. 

Hart,  D.M.,  Anderson,  S.D.,  &  Cohen,  P.R.  Envelopes  as  a  vehicle  fur  improving  the 
efficiency  of  plan  execution.  Proceedings  of  the  Workshop  on  Innovative 
Approaches  to  Planning,  Scheduling,  and  Control,  Katia  Sycara  (Ed.).  Morgan- 
Kaufman,  1990.  Pp.  71-76. 

Howe,  A.E.  &  Cohen,  P.R.  Detecting  and  explaining  dependencies  in  execution 

traces.  To  appear  in  Proceedings  of  the  Fourth  International  Workshop  on  AI  and 
Statistics.  1993. 

Howe,  A.E.  Isolating  dependencies  on  failure  by  analyzing  execution  traces. 
Proceedings  of  the  First  International  Conference  on  AI  Planning  Systems. 

Morgan  Kaufmann.  1992.  Pp.  277-278. 

Howe,  A.E.  Analyzing  failure  recovery  to  improve  planner  design.  Proceedings  of  the 
Tenth  National  Conference  on  Artificial  Intelligence.  --VAAI  Press.'MIT  Press. 
1992.  Pp.  387-392. 

Howe,  A.E.  &  Cohen,  P.R.  Failure  recovery:  A  model  and  experiments.  Proceedings 
of  the  .\inth  National  Conference  on  Artificial  Intelligence .  Pasadena.  CA.  July 
1991.rp.C01-80S. 

Ilowe,  A.E.,  Hart,  D.M.  &  Cohen,  P.R.  Addressing  real-time  constraints  in  the 
design  of  autonomous  agents.  The  Journal  of  Real-Time  Systems,  1(  l.'2):81-97. 

1990. 
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Howe,  A.E.  &  Cohen,  P.R.  Responding  to  environmental  change.  Proceedings  of  the 
Workshop  on  Innovative  Approaches  to  Planning,  Scheduling,  and  Control,  Katia 
Sycara  (Ed.).  Morgan-Kaufman,  1990.  Pp.  85-92. 

Powell,  (Jerald  M.  and  Cohen,  Paul  R.  Operational  planning  and  monitoring  with 
envelopes.  Proceedings  of  the  Fifth  Annual  A1  Systems  in  Government 
Conference.  1990. 

2.2.2.  Refereed  Papers  Submitted 

Hanks,  S.,  Pollack,  M.E.  &  Cohen,  P.R.  Benchmarks,  testbeds,  controlled  experimen¬ 
tation,  and  the  design  of  agent  architectures.  To  appear  in  AJ  Magazine. 

Howe,  A.E.  Improving  the  reliability  of  AI  planning  systems  by  anah'zing  their 
failure  recovery.  SubmiUed  to  IEEE  Transactions  on  Knowledge  and  Data 
Engineering. 

Howe,  A.E.  &  Cohen,  P.R.  Understanding  planner  behavior.  Submitted  to  the 
Artificial  Intelligence  Journal  Special  Issue  on  Planning  and  Scheduling  (D. 
McDermott  &  J.  Hendler,  eds.). 

2.2.3.  Invited  Papers  Published 

Cohen,  P.R.  Methodological  problems,  a  model-based  design  and  anaivsis 

methodology,  and  an  example.  Proceedings  of  the  International  Symposium  on 
Methodologies  for  Intelligent  Systems.  Pp.  33-50.  Knoxville,  TX,  Oct.  25-27,  1990. 

22.4  Refereed  Workshop  Abstracts  and  Symposia  Papers 

Cohen,  P.R.,  Anderson,  S.D..  Hart,  D.M.  Scheduling  agent  actions  in  real-time. 
Abstract  for  The  Interdisciplinary  Workshop  on  the  Design  Principles  for  Real- 
Time  Knowledge  Based  Control  Systems  at  the  Eighth  National  Conference  on 
Artificial  Intelligence.  Boston,  MA,  1990. 

Cohen,  P.R.  &  Howe,  A.E.  Benchmarks  are  not  enough;  Evaluation  metrics  depend 
on  the  hypothesis.  Collected  Notes  from  the  Benchmarks  and  Metrics  Workshoo. 
Technical  Report  FIA-91-06,  NASA  Ames  Research  Center.  Pp.  lS-19.  1990. 

Hart,  David  M.  and  Cohen.  Paul  R.  Phoenix:  A  testbed  for  shared  planning  research. 
Collected  Notes  from  the  Benchmarks  and  Metrics  Workshop.  Technical  Report 
FIA-91-06,  NASA  Ames  Research  Center.  Pp.  20-27.  1990. 

Howe,  A.E.  Evaluating  plan.ning  through  simulation:  An  example  using  Phoenb;. 
Working  Notes  of  AA.\I  Spring  Symposium  on  Foundations  of  Classical 
Planning.  Palo  Alto,  CA,  March  1993. 

Howe,  A.E.  Failure  Recover/  .Analysis  as  a  tool  for  plan  debugging.  In  Working  Notes 
of  the  AAAI  Spring  Symposium  on  Computational  Considerations  in  Supporting 
Incremental  Modification .  Palo  Alto,  CA.  March  26-27,  1992. 

Howe,  A.E.,  Hart,  D.M.  &  Cohen,  P.R.  Designing  agents  to  plan  and  .ict  in  their 
environments.  Abstract  ;or  The  Workshop  on  Automated  Planning  for  ('eintncx 
Domains  at  the  Eighth  National  Conference  on  /Vrlificial  Intehiaence.  Ih.-  ’wr. 

MA,  1990. 

Howe,  Adcle  E.  Inlegratir./  .;daptalion  with  pi. inning  to  imjirove  :  in 

unpredictable  environn-.v:-.; s.  In  Planning  in  Uncertain,  Unnredictahle,  <-r 
Changing  Environments  Working  Notes  of  the  1U90  A.\.\l  S:>":".g  ; 

Also,  rechnical  Researc;;  Report  i/OO-lG,  Svsteins  Research  Center.  Lhiiv  .’■'itv  .  : 
Mar^’land,  1900. 
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Silvey,  P.E.,  Loiselle,  C.L.  &  Cohen,  P.R.  Intelligent  data  analysis.  Working  Notes  of 
the  AAAI -9 2  Fall  Symposium  on  Intelligent  Scientific  Computation.  Cambridge, 
MA.  October  23-24,  1992. 

2.2.5.  Books  or  Parts  Thereof  Published 

Cohen,  P.R.  Empirical  Methods  for  Artificial  Intelligence .  Textbook  in  preparation. 

Cohen,  P.R.  Architectures  for  Reasoning  Under  Uncertainty.  1990.  Readings  in 
Uncertain  Reasoning.  Glenn  Shafer  and  Judea  Pearl,  Eds.,  Morgan-Kaufmann. 

Howe,  Adele  E.  and  Cohen,  Paul  R.  How  evaluation  guides  AI  research.  Reprinted  in 
A  Sourcebook  of  Applied  Artificial  Intelligence.  Gerald  Hopple  and  Stephen 
Andriole,  Eds.  TAB  Books,  Inc.  1990.  (Originally  published  in  A/ Magazine, 
Winter,  1988.) 


2.2.6.  PhJl.  Dissertations 

Howe,  A.E.  Accepting  the  Inevitable:  The  Role  of  Failure  Recovery  in  the  Design  of 
Planners.  February,  1993. 

2.2.7.  Unrefereed  Reports  and  Articles 

Anderson,  S.D.  &  Hart,  D.M.  Monitoring  interval.  EKSL  Memo  till.  Experimental 
Knowledge  Systems  Laboratory,  Dept,  of  Computer  Science,  Univ.  of 
Massachusetts,  Amherst.  1990. 

Cohen,  P.R.,  Hart.  D.M.,  &  Devadoss,  J.K.  Models  and  experiments  to  probe  the 
factors  that  affect  plan  completion  times  for  multiple  fires  in  Phoenix.  EKSL 
Memo  it  17.  Experimental  Knowledge  Systems  Laboratory,  Dept,  of  Computer 
Science,  Univ.  of  Massachusetts,  Amherst.  1990. 

Fisher,  D.E.  Common  Lisp  Analytical  Statistics  Package  (CLASP):  User  manual. 
Technical  Report  90-85,  Dept,  of  Computer  Science,  Univ.  of  Massachusetts, 
Amherst.  Revised  and  expanded,  1991. 

Greenberg,  M.L.  &  Westbrook,  D.L.  The  Phoenix  testbed.  Technical  Report  90-19, 
Dept,  of  Computer  Science,  Univ.  of  Massachusetts,  Amherst,  MA,  1990. 

Hansen,  E.A.  &  Cohen,  P.R.  Learning  a  decision  rule  for  monitoring  tasks  xvith 
deadlines.  Technical  Report  k92-80.  Dept,  of  Computer  Science,  Univ.  of 
Massachusetts,  Amherst.  1992. 

Hansen,  E.A.  The  effect  of  wind  on  fne  spread.  EKSL  Memo  i^lO.  E.xperimental 
Knowledge  Systems  Laboratory,  Dept,  of  Computer  Science,  Univ.  of 
Massachusetts,  Amherst.  1990. 

Hansen,  E.A.  A  model  for  wind  in  Phoenix.  EKSL  Memo  it  12.  Experimental 
Knowledge  Systems  Laboratory,  Dept,  of  Computer  Science,  Univ.  of 
Massachusetts,  .Amherst.  1990. 

Howe,  A.E.  &  Cohen,  P.R.  Debugging  plan  failures  by  analyzing  execution  traces. 
Technical  Report  k92-22,  Dept,  of  Computer  Science,  Univ.  of  .Massachu.'Ctts, 
Amherst.  1992. 

Howe,  A.E.  Did  we  measure  what  we  thought?:  Prolilems  with  the  melhnii  cri-.l 

measure.  EKSL  .Memo  P16.  Experimental  Knowledge  Systems  Laboratory,  Dept  , 
of  Co.mputer  Science,  Univ.  of  Massachusetts  at  Amherst.  1991. 
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Silvey,  P.E.  Phoenix  baseline  fire  spread  models.  EKSL  Memo  ^13.  Experimental 

fiiowledge  Systems  Laboratory,  Dept  of  Computer  Science,  Univ.  of 

Massachusetts,  Amherst.  1990. 

2Ji.  Conferences,  Workshops  and  Presentations 
2.3.1.  Invited  Presentations 
Cohen,  P.R. 

•  Member  of  a  panel  entitled  "Plarming  under  uncertainty"  at  the  AAAI  Workshop 
on  Production  Planning,  Scheduling  and  Control,  which  focused  on  scheduling 
strategies  for  managing  uncertainty  in  complex,  real-time  environments,  July 
1992. 

•  Invited  presentation  on  EKSL's  current  research  at  a  two  day  meeting  of  the 
Institute  for  Defense  Analysis's  Information  and  Science  Technology  Advisory 
Group  on  Simulation  in  Washington,  DC,  June  1992. 

•  Member  of  a  panel  entitled  "The  empirical  evaluation  of  planning  systems: 
Promises  and  pitfalls"  at  the  First  International  Conference  on  AI  Planning 
Systems  at  the  Univ'.  of  Marj’land,  June  1992. 

•  Methods  for  agentology:  General  concerns,  specific  examples.  Invnted  talks  at 
Virginia  Poljtechnic  Institute  and  the  Univ.  of  West  Virginia.  April  1992. 

•  Three  examples  of  statistical  modeling  of  an  AI  program.  Invited  talk  at  the 
Univ.  of  Texas,  Austin.  March  1992. 

•  Member  of  a  panel  entitled  "The  future  of  expert  systems"  chaired  by  Dr.  Y.T. 
Chen  of  NSF  at  the  World  Congress  on  Expert  Systems,  December  1991,  in 
Orlando,  Florida. 

•  A  brief  report  on  a  survey  of  AAAI-90,  some  methodological  conclusions,  and  an 
example  of  the  MAD  methodology  in  Phoenix.  Keynote  address,  AA47  Spring 
Symposium  on  Implemented  AI  Systems.  Palo  Alto,  CA.  March,  1991. 

•  Methodological  problems,  a  model-based  design  and  analysis  methouolo^v ,  and 
an  example.  Keynote  address  at  the  International  Symposium  on  Mclhodoiogics 
for  Intelligent  Systems.  Ivnoxvdlle,  TN.  October  1990. 

•  What  is  an  interesting  environment  for  AI  planning  research?  P.anel  modera¬ 
tor,  Workshop  on  Automated  Planning  for  Complex  Domains  at  the  Eu’hth 
National  Conference  an  .Artificial  Intelligence.  Boston,  July  19'JU. 

•  Modeling  for  AI  system  design.  Imperial  Cancer  Research  Fund,  London, 
England.  June  1990. 

•  Modelling  for  AI  system  design.  Digital  Equipment  C'orporation.  Gab.vay. 
Ireland.  June  1990. 

•  Fire  will  destroy  the  pestilence,  or.  How  natural  environiacr.ts  will  drive  uut  b,ul 
methodology’.  Texas  Instruments,  Dallas.  May  1990. 

•  Designing  autonomous  , agents.  Tliaver  School  of  I'hi;'ii;ei-.''in;:,  D.-.i’.mou’,;'. 
College.  .\ovem!)('r  Ife-.O, 
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Howe,  A.E. 

"Accepting  the  Ine\atable:  The  Role  of  Failure  Recovery  in  the  Design  of  Planners". 

•  Dept,  of  Computer  Science,  Oregon  State  Univ.,  March  1992. 

•  Dept,  of  Computer  Sciences,  Purdue  Univ.,  March  1992. 

•  Computer  Science  Dept.,  Univ.  of  Maryland,  Baltimore  County,  March  1992. 

•  Computer  Science  Dept.,  Colorado  State  Univ.,  April  1992. 

•  Dept,  of  Electrical  and  Computer  Engineering,  Clarkson  Univ.,  April  1992. 

•  School  of  Computer  Science,  Carnegie  Mellon  Univ.,  April  1992. 

2.3,2.  Contributed  Presentations 

Cohen,  P.R. 

•  Welcoming  address  (untitled)  at  the  NSF/DARPA  Workshop  on  Artificial 
Intelligence  Methodology.  Northampton,  MA.  June  1991. 

•  The  Phoenix  project:  Responding  to  environmental  change.  Workshop  on 
Innovative  Approaches  to  Planning,  Scheduling,  and  Control  .  San  Diego,  CA. 
November  1990. 

•  Intelligent  real-time  problem  sohdng:  Issues  and  examples.  Presented  at  the 
Intelligent  Real-Time  Problem  Solving  Workshop,  Santa  Cruz,  CA,  November 
1989. 

Hart,  D.M. 

•  Predicting  and  explaining  success  and  task  duration  in  the  phoenix  planner. 
Paper  presentation  at  the  Fj>s(  International  Conference  on  A1  Planning 
Systems  at  the  University  of  Maryland,  June  1992. 

Howe,  A.E. 

•  Detecting  and  explaining  dependencies.  Poster  presented  at  the  Fourth 
International  Workshop  on  A1  and  Statistics,  F^.  Lauderdale,  FL.  Januarv'  1993. 

•  Analyzing  failure  recovery  to  improve  planner  design.  Paper  presented  at  the 
Tenth  National  Conference  on  Artificial  Intelligence.  San  Jose,  C.A..  July  1992. 

•  Isolating  dependencies  on  failure  by  analyzing  execution  traces.  Poster 
presentation  at  the  First  International  Conference  on  A1  Planning  Sxstems  at 
the  University  of  ^lc.yland,  June  1992. 

•  Failure  recovery  analysis  as  a  tool  for  plan  debugging.  AAL\I  Spring 
Symposium  on  Computational  Considerations  in  Supporting  Incremental 
Modification.  Palo  Alto,  CA.  March  1992. 

•  Failure  recovery:  A  model  and  experiments.  Paper  presented  at  the  Ninth 
National  Conference  on  Artificial  Intelligence.  Pasadena,  CA.  July  1991. 

•  Adaptable  planning  in  the  Phoenix  system.  Poster  presentation  at  the 
Symposium  on  Learning  Methods  for  Planning  and  .Scheduling,  Palo  .Alto,  C.A. 
January  1991. 

•  Designing  agents  to  plan  and  act  in  their  environments.  Workshop  on 
Automated  Planning  ]or  Complex  Domains  at  the  Eighth  National  Confercice 
on  .Artificial  Intelligence.  Boston,  ALA.  July  1990. 

•  Integrating  adaptation  '.vith  planning  to  improve  behavior  in  unpretiictable 
environments.  Planning  in  Uncertain,  Unpredictable,  nr  Changing 
Environments,  .A.A^AI  Spring  Symposium,  I’alo  Alto,  CA,  March  1990. 
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Powell,  G.M. 

•  Operational  planning  and  monitoring  with  envelopes.  IEEE  Fifth  Annual  A1 
Systems  in  Government  Conference.  Washington,  DC.  May  1990. 

2,3,3.  Tutorials 

Cohen,  P.R. 

•  Offered  a  tutorial  wnth  Bruce  Porter  fof  the  University  of  Texas)  at  the  T enth 
National  Conference  on  Artificial  Intelligence  entitled  "Experimental  Methods 
in  Artificial  Intelligence."  This  tutorial  used  several  examples  from  our 
research  under  this  contract,  including  the  results  reported  in  (Hart  &;  Cohen 
1992)  and  (Cohen,  St.  Amant  &  Hart,  1992).  This  tutorial  will  be  offered  again  at 
the  Eleventh  National  Conference  on  Artificial  Intelligence  in  1993. 

Hart,  D.M. 

•  Tutorial  on  the  Phoenix  Testbed  and  real-time  research  being  conducted  in 
Phoenix  at  Wright  Patterson  AFB,  April,  1991.  This  tutorial  was  offered  for 
potential  consumers  of  IRTPS  research  results. 


J) 
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2.4.  Awards,  Promotions,  Honors 

Cohen,  P.R. 

•  Elected  a  Fellow  of  the  American  Association  for  Artificial  IntelUgence. 

•  Elected  a  Councilor  of  the  American  Association  for  Artificial  Intelligence  for 
the  term  1991-94. 

•  Appointed  to  the  Information  and  Science  Technology  Ad\'isory  Group  on 
Simulation,  Institute  for  Defense  Analysis. 

•  As  an  AAAI  Councilor,  served  as  Chair  of  the  AAAI-93  Tutorial  Committee, 
Co-chair  of  the  1992-93  Symposium  Committee,  and  Assistant  to  the  Chair  for 
the  Program  Committee  of  AAAI -93. 

•  Chairman,  NSF/DARPA  Workshop  on  AI  Methodology.  University  of 
Massachusetts.  June,  1991 

•  Organizing  Committee,  AAAI  Workshop  on  IntelHgent  Real-Time  Problem 
Solving.  Anaheim,  CA  July,  1991. 

•  Program  Committee,  Sixth  International  Symposium  on  Methodologies  for 
Intelhgent  Systems  (ISMIS'91).  Charlotte,  NC.  October  1991. 

Hansen,  E.A. 

•  Recipient  of  an  ARPA/AFOSR  Augmentation  Award  for  Science  and 
Engineering  Research  Training  for  an  investigation  of  monitoring  strategies 
related  to  EKSL  work  in  pathology  detection  in  Phoenix  and  in  transportation 
planning. 

Adele  E.  Howe 

•  Appointed  Assistant  Professor  of  Computer  Science  at  Colorado  State  University, 
September,  1992. 


21 
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2^  Technology  Transfer 


1992;  DARPA/Rome  Labs  Planning  Initiative 

Much  of  the  work  we  have  done  in  Phoenix  is  being  transferred  to  the  DARPA  Kome 
Labs  Planning  Initiative  (PI).  This  includes  the  creation  of  a  testbed  environment  for 
controlled  experimentation,  our  ongoing  work  with  envelopes  and  monitoring,  and 
the  use  of  path  analysis  to  build  causal  models  of  program  behavior.  Part  of  the  PI 
effort  involves  bviilding  a  Common  Protot3rping  Environment  (CPE)  for  integrating  and 
evaluating  components  of  the  evolving  planning  and  scheduling  architecture.  The 
CPE  will  have  many  of  the  kinds  of  testbed  features  we  built  into  Phoenix  and  are 
building  into  our  simulation  of  the  transportation  planning  domain. 

Using  Simulation  Testbeds  to  Design  AI  Planners.  Over  the  last  four  years,  with 
funding  from  DARPA,  URI,  and  IRTPS,  we  have  created  a  testbed  environment  for 
Phoenix.  This  effort  included  instrumenting  the  system,  baselining  the  simulated 
environment,  and  providing  such  facihties  as  predefined  scenarios,  scripts,  and 
primitives  for  experiment  definition  and  data  collection.  We  integrated  the  first 
version  of  CLASP^  into  this  testbed  environment  to  pro\’ide  built-in  data  analv-sis 
capabilities.  Since  that  time  we  have  extended  many  of  these  testbed  features  in 
Phoenix  and  ported  them  to  our  simulation  of  the  sea  transport  domain  in  the  PI. 

Many  of  these  features  will  soon  find  their  way  into  the  CPE  under  the  direction  of  the 
Issues  Working  Group  on  Protot5T)ing  Environment,  Instrumentation  and 
Methodology.  Paul  Cohen  is  the  co-chair  of  this  group  along  with  Mark  Burstein  of 
BBN. 

Enveloi>es  and  Monitoring.  Our  work  with  envelopes  and  monitoring  in  Phoenix  is 
directly  applicable  to  the  design  of  pathology  demons  for  the  transportation  planning 
problem  that  is  the  domain  of  the  PI.  Pathology  demons  are  designed  to  detect  t^-pical 
pathologies  that  arise  during  the  execution  of  large-scale  transportation  plans  and  to 
help  the  user  (interacting  through  informed  visualizations)  steer  the  plan  around  the 
pathologies.  Envelope-like  representations  of  plan  progress  tell  us  whether  we  are 
keeping  trim  to  the  schedule,  and  our  developing  theories  of  appropriate  monitoring 
strategies  tell  us  how  often  to  monitor  and  what  to  watch. 

Building  Causal  Models  of  Program  Behavior  using  Path  Analysis.  During  this 
contract  we  began  appljnng  path  analysis  to  our  work  in  Phoenix.  We  think  path 
analysis  will  proN-ide  a  powerful  technique  for  building  causal  models  from  large  data 
sets  such  as  those  generated  by  experiments  in  Phoenix  and  the  Pi's  CPE.  We  recently 
enhanced  CL-ASP  by  adding  a  module  for  path  analysis.  Using  a  graphical  intenace. 
the  user  draws  a  directed  graph  of  hj'pothesized  causal  influences  among 
independent  and  dependent  variables,  and  the  path  analysis  module  calculates  the 
correspona.ng  path  coefficients  (strengths  of  influence)  along  the  arcs  'one  r-.-r  a.rc'. 
The  user  can  explore  variations  on  the  model  simply  by  modifying  nodes  and  .ires  in 


^  TTie  Corr.rr.'^n  Lnp  A.-iaivtical  Suitistics  Package  (CLASP)  was  onpnally  implemenud  on  the  TI  nx:  h  rtr 
for  analyzim;  P.roercx  expenmonUs.  For  more  on  CLASP,  see  Fislier  (19')0). 
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the  graph  -  recalculation  is  done  automatically.  Such  a  facility  will  help  users  fit 
causal  models  to  planner  behaviors  that  arise  in  CPE  simulations.  As  part  of  a  new 
contract  in  the  PI,  we  ■will  be  building  an  experiment  module  that  automatically  gen¬ 
erates  causal  models  from  data  sets  to  aid  developers  in  the  design  and  evaluation  of 
AI  planning  systems. 

1991 

In  April,  Da-vid  Hart  (EKSL  Lab  Manager)  and  David  Westbrook  (EKSL  Systems 
Developer)  •visited  Mark  Burstein  at  BBN  to  see  a  demonstration  of  the  Dynamic 
Replanning  and  Analysis  Tool  (DART)  being  developed  as  part  of  the  DARPA 
Planning  Initiative  for  USTRANSCOM.  As  part  of  this  initiative,  EKSL  began  work 
under  contract  in  the  last  quarter  of  FY91.  Our  ■visit  to  BBN  was  designed  as  an 
exchange  of  information  about  the  use  of  simulation  in  complex  planning  problems. 

A  significant  part  of  BBN's  contribution  to  DART  is  the  Prototype  Feasibility  Estimator 
(PFE),  a  dispatch  scheduling  program  designed  to  demonstrate  the  gross  feasibility  of 
USTRANSCOM  operation  plans.  These  plans  are  currently  developed  by 
USTPANSCOM  planners,  but  will  eventually  be  generated  by  planning  technolog\^ 
produced  by  this  initiative.  We  discussed  possibilities  for  enhancing  PFE  to  simulate 
the  movement  of  resources  and  cargo  through  the  transportation  network,  much  as 
we  simulate  the  movement  of  fire-fighting  agents  in  the  Phoenix  world.  Such  a 
simulation  could  be  used  to  watch  a  plan  execute,  allowing  operators  to  recognize 
problems  as  they  develop  and  "steer"  the  plan  around  them. 

Mark  Burstein  visited  EKSL  in  August  to  see  Phoenix  and  continue  the  discussions 
mentioned  above.  While  here  he  consulted  on  our  efforts  to  get  PFE  running  and  gave 
us  some  invaluable  hands-on  assistance. 

1990 

Paul  Cohen  and  Da^id  Hart  \'isited  the  Decision  Systems  Laboratory  at  Texas 
Instruments  in  Dallas,  May  24-25.  Cohen  presented  a  talk  entitled  “Fire  will  Destroy 
the  Pestilence,  or,  How  Natural  Environments  will  Drive  Out  Bad  Methodology,-.’ 
Phoenix  was  demonstrated  for  the  DSL,  and  we  looked  at  a  number  of  their  projects, 
including  CACTUS,  a  battlefield  planning  system  that  is  conceptually  similar  to 
Phoenix,  but  implemented  differently.  We  discussed  doing  a  comparative  analysis  of 
these  two  systems  to  show  they  fall  within  an  equivalence  class  with  respect  to  the 
task  en\nronments  and  design  of  agents  for  those  em-ironments.  Such  an  analysis 
would  attempt  to  show  that  both  s3’stems  can  be  represented  using  the  same 
underh'ing  model  for  the  task  env-ironment  and  agent  design,  thus  substantiating  tb.e 
methodological  approach  we  advocate. 

We  also  discussed  at  length  with  TI  the  u.se  ol  vi.sualization  tecnniques  to  aid  in  ilie 
interpretation  and  analy.sis  of  a  svstem  that  simulates  shop  Hoor  activitic'S  in  .i  semi¬ 
conductor  fabrication  plant.  The  simulation  allows  experimentation  with  vano'us 
scheduling  strategies  to  improve  plant  throughput.  However,  the  volume  of  data  it 
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produces  overwhelms  the  capabilities  of  traditional  data  analysis  techniques.  Our 
discussions  focused  on  ways  of  visualizing  pathologies  that  arise  during  (the 
simvdation  of)  shop  floor  processing  that  cause  the  operant  scheduling  strategy  to 
perform  poorly  or  fail,  so  that  the  user  can  intervene  as  problems  develop  and  explore 
the  causes  by  pausing  and  interacting  with  the  system  graphically.  These  ideas  are 
based  on  our  work  in  Phoenix  wth  simulation,  graphical  interfaces,  and  envelopes; 
they  are  also  the  subject  of  another  DARPA  contract  {Visualization  and  Modeling  for 
Interactive  Plan  Development  and  Plan  Steering). 

Paul  Cohen  presented  a  talk  entitled  “Modelling  for  AI  System  Design”  at  Digital 
Equipment  Corporation  in  Galway,  Ireland,  and  at  the  Imperial  Cancer  Research 
Fund  m  London  on  June  25.  These  talks  led  to  plans  to  hold  a  workshop  sponsored  by 
NSF  and  DARPA  in  early  1991  on  methodology  in  AI  research.  DEC  considered  using 
the  Phoenix  planning  system  as  part  of  a  market  simulator  for  new  computer 
products,  designed  to  allow  DEC  marketing  executives  to  simulate  alternative  pricing 
structures  for  the  products  in  order  to  find  the  most  advantageous.  Phoenix  planning 
agents  play  the  roles  of  DEC's  competitors,  responding  to  the  introduction  of  DEC 
products  with  changes  in  their  owm  product  lines  and  pricing  structures. 

Gerald  M.  Powell  was  a  visiting  faculty  member  at  EKSL  under  the  Secretary’  of  the 
Army  Research  and  Study  Fellowship  Program.  Dr.  Powell,  who  works  for  the 
Center  for  Command,  Control,  and  Communications  Systems,  CECOM,  Ft. 
Monmouth,  New  Jersey,  had  investigated  computational  approaches  to  various 
problems  in  battlefield  planning  for  the  previous  five  years,  and  was  very’  interested  in 
the  design  and  development  of  Phoenix.  He  had  worked  previously  wth  Paul  Cohen 
applying  envelopes  to  an  operations  planning  problem  in  battlefield  management. 
During  the  reporting  year  he  studied  (among  other  issues)  the  application  of 
approximate  processing  techniques  for  real-time  control  in  Phoenix. 
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2.6  Software  Prototypes 
1991 

In  1991  we  enhanced  the  Common  Lisp  Anal3rtical  Statistical  Package  (CLASP)  by 
adding  several  new  statistical  tests  and  porting  it  to  a  UNIX -based  Common  Lisp 
environment.  CLASP  was  developed  (under  URI  funding)  for  the  statistical  analysis  of 
large  data  sets  on  the  TI  Explorer.  This  modularized  system  is  the  kernel  of  the 
Phoenix  experimental  interface.  It  can  be  used  as  an  interactive  analysis  tool  for  data 
generated  experimentally,  providing  powerful  data  manipulation  tools,  standard 
statistical  tests,  and  plotting  capabihties.  In  addition,  it  can  be  accessed  as  a  runtime 
library  by  programs  (e.g.,  Phoenix  agents)  using  statistical  and  probabiUstic  models. 

This  ported  version  of  CLASP  will  be  integrated  into  the  Common  ProtoU'ping 
Environment  (CPE)  being  developed  at  BBN  for  the  DARPA  Planning  Initiative  (see 
Technology  Transfer),  where  it  will  be  used  to  analyze  the  dynamics  of  the 
transportation  problem,  as  well  as  the  planning/scheduHng  techniques  applied  to  the 
problem  (now  completed,  1993).  While  the  interface  being  developed  for  this  ported 
version  is  specific  to  CPE,  future  plans  include  providing  a  generic  CL.A.SP  interface 
using  the  Common  Lisp  Interface  Manager  (now  completed,  1993).  This  version  of 
CLASP  would  run  standalone  in  most  Common  Lisp  emdronments.  To  support  such 
an  implementation,  we  have  developed  and  documented  an  automated  test  suite  for 
the  CLASP  package  that  validates  its  functionality.  This  test  suite  can  be  run  to 
uncover  bugs  and  inconsistencies  between  systems  and  versions  whenever  CLASP  is 
ported  to  a  new  platform. 

1990 

We  have  made  Phoenix  available  as  an  instrumented  testbed  for  use  by  other 
researchers  designing  autonomous  agents  for  complex,  real-time  en\’ironments  as 
part  of  the  Intelligent  Real-time  Problem  Solving  initiative  (Cohen,  Howe  &  Hart  19S9) 
and  as  part  of  a  new  initiative  in  evaluation  and  benchmarking  of  planning  svstems 
for  complex,  dynamic  eaxtronments  (Cohen  &  Howe  1990;  Hart  &  Cohen  1990).  It  is 
also  being  used  by  the  Cooperative  Distributed  Problem  Sohang  Laboratorv  (under 
Victor  Lesser)  at  the  University  of  Massachusetts  (Moehlman  &  Lesser  1990). 
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3.  Fatfure  Recovery  and  Failure  Recovery  Analysis  in  Phoenix 

This  paper,  summaiiziag  part  of  Adcle  E.  Howe's  thesis  “Accepting  the  inevitable:  The  role  of  failure 
recovery  in  the  design  of  planners,"  has  been  submitted  to  the  IKFF-  Transactions  on  Knowledge  and  Data 
Engineering.  Some  of  these  results  appeared  previously  in  the  Proceedings  of  the  Ninth  and  Tenth  National 
Conference  on  Artificial  Intelligence  (1991  a^  1992). 
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Ab.«tract 

A>  iilauiiiii"  i>'r!iiiolo'.;y  iiuiiro\es.  .\I  plaiiii**r>  are  l)eiii”  embediiei)  jn  iiiTea'inyl.N  re!))|ili. 
caietl  eii\ iioimiein-.  oiie>  iliat  are  particularly  rlialleiigiii"  even  for  lniniaii  i-xpf-iiv  (  'oii^e. 

failure  i-  liecoiiiinj,  I'Otli  iiicrea>iu"ly  likely  for  ihevt-  i-lu--  iri  i||.-  .h!fir-;lt 

a  1 1. 1  - 1>  uaiiiii'  iiat  lire  of  1 1|*>  ii.-u  <>ii\  iioniiieiii')  and  i  nr  reav)  Hi:  I )  imp-  -i '  hu  i  ■  •  .i-l  -  li  -  •v-.  i  -  iii- 
lo  lie-  vy. 1,-111^  poieiiiial  11--  oil  I'-al  xvoil-1  ap|ili<-i,i  loii-).  Tin-  pap-i  nl---  lie-  -l-x.l- 
-  ■j-iie-iil  of  a  fa  lime  r--ro\ei\  <-onipoie-ni  fm  a  planie-r  in  a  cr-mpl-  \  'iiiml.ii-  -1  ■  lo  lo  .une  m 
-iiel  -I  pr- e-,-(liiie  (call.-d  lailme  He,-o\er\  Aiial\^i>)  for  a'-i-lin-i.  pi  i  .-ii  aiiiiii-  i '  m  -I- lm'.::iiir.; 
ili.ii  |ilaiiieT  Tie-  f.iilm*-  r->-..\.-r\  de^ien  i'  ileiali\<  ly  enliaie-.l  aiel  -  \alu.o-l  lu  a  -.-n—  o| 
•  .\p>'r  I  iie'in  -  lailme  l?<‘r(i\rr\  Aiialv -i- i"- ^l'•'C^ll■e^|  .and  <lenujii~i  i  ai '-d  oii  an  i-xampl-  fioiii 
lie-  Flioeinx  planner.  Tie-  pniiiaiy  advantaeeof  llie-e  appioai  In-'  oxej  t-.\j.iin^  appi oacle-'. 
i-  llial  the)  are  ha-ed  on  onl\  a  weak  model  of  tin*  idanie-r  and  il'  en\  iimuie-ni .  inakin-i. 
them  siiiialile  when  tlie  |ilaiiiier  i-  being  ilexeloped  an<l  niakiii;:  them  e,a-il\  u.-ie-ralizalile  to 
other  plaimera  and  environments.  Together,  failure  recovery  ;iml  l  ailme  b.  r.iveiv  .inalv-i- 
imiMovf  the  reliability  of  tie-  planner  by  iireventing  failure.^  due  i.j  bm;'  in  the  planie-i  aie] 
repairing  failures  during  exemiion. 


■  I  he  I urli  Mi|,|,oil.-d  liv  a  I  ).\  I!  l’.\- .-\  rOSI!  fdtiii.ui  I'.^•.l|.Jn-^U-(  •.(in  I  It,  1 1,,-  s.  ■  i 

il.iiiiiii  Miehr  .111  b.ie-v  jn  |li-.vl.  [  mi,-  < 'oni  pin  ill"  "i.nil,  (.'l)A-S".l.’.’‘,7.’,  .md  ,i  "i.nii  Ih.iii  lie-  V)lla<  .!  .'..e.i, 
lle-.-aicIi  iiiidct  lie-  riiivci-ilv  lieM-anli  Inil  i.ii  ive  Mlllll  I K-tlTlit .  'lie-  t 'S  (  Idmi  iiiieiii  e  .imleui.-.d  t.i  i-- 
prculiire  .and  vlel  nleile  lepinns  for  rovcrnmcin.il  puipo-es  iiol vvil le-t .andiiig  am  oipv  iiihl  nol.ileui  le  na.ii  !  !  .. 
n--(airli  w.is  coiidiK'Icd  .1--  part  of  liiv  l^hl)  lle-'-i^  ir-^e.irrli  at  tie*  1  iiivvi-n.v  of  M  .i".e  lei--' 1 1  s.  1  would  hi,.'  io 
thank  niy  tln-'-is  ad\i-or,  I’.uil  (.ole  n,  for  hw  .ailvir»-.  "Iiel.iiirr-  and  snpervi-iun  of  the  n  I'.u.  h 
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1  Introduction 

As  planning  technology  improves.  .\I  pianners  are  being  embedded  in  increasingly  com- 
jdicated  environments:  ones  that  are  particularly  challenging  even  for  human  experts.  Con.se- 
';nei’tly.  failure  is  becoming  both  increasingly  likely  for  these  systems  and  increa.sincly  important 
:o  adihess.  Failure  is  incieasingly  likely  because  of  the  difficult  and  dynamic  nature  of  the  ne\v 
environments:  failure  is  increasingly  important  to  address  because  of  the  systems'  ])otential 
':>e  on  applications  such  as  >chednling  manufacturing  production  lines  ['i'!]  and  Hubitle  ^pace 
•^l<*-'CO])e  time  [lO],  and  controlling  robots  [l-')]. 

.\1  plantters  determine  a  course  of  action:  it  may  be  the  next  action  to  be  taken  or  may  be 
••  Iona  seciuence  of  actions.  Plan  failures  may  Ire  caused  by  actions  not  havina  their  intended 
■  :Tf"'t'.  hy  unexpected  environmental  cliatiges  or  by  inade<juacies  in  the  planin';  ii'clf.  This 
■..apc.v  describes  an  iniearaiodi  approach  to  dealing  with  failure,  which  U'"s  h-ediiack  from  failure 
: ■‘cen  e;  to  help  debua  than  failure^.  This  a])proach  wa^  de\eloj>ed  as  part  of  m\  i  l.e^i'  le^earch 
on  ini])rovina  the  i \  of  iho  Phoenix  planner. 

i.l  Apirroaches  to  linproxiiig  Planner  Reliability 

;  :  ai,  1. bo.'ii  ,iii<i !  III  t'.Vo  ■.■..I'.-.;  .1  u  I  olii  .1 1  i‘i  i  '.i.OO" 

•  ,d  ih'iiiiaaina.  lie-  In-t  ih'-'ianina  'In*  'oftwaie  to  d.-i.-n  .nul  leji.n)  ,\\;i  naiiii:--', 

!  ,e-  '--'olid  method  i'  tu  d-'iiiiu  the  ^oft^\are  t<i  i<-nio\e  tin-  rae—''  of  f.iiiuie. 

I  lir>I  jiai  t  of  till-'  p.ipei  (h-'Cl  ilx-s  t  li‘‘  dex|<_^ii  and  tie-  in-'t  hodoio-j  \  h'‘hi:i'i  d-'-ie  '.1  -  -  ! 
:e;t  r  t  I’d  failure  1  l■(•(l\ I'l f -i  an  .\1  phniiier.  I  he  -.econd  p.arl  o-'-iiihie  ,t  o:.  1., 

..  Ilf  l,'(  I  an  ni  A  !ii:i  :^i~  ' i  11  A  )  for  a  naly/ima  t  he  ]■•■l  !■ -:  i;,.i;;i  <■  -  if  f.iriu: 1 o.-i 
I';.  liow  the  phimie’  v  kno'.*.  h’'hje  l);i><-  miuht  inlluenee  I  h*' oian ;  1  enrc  ol  j  i,-. :  :  o  1  l.ine,’.-. 

.  .  1 1  ]i.i  1 1 .lie  I  ]'_!  ai  1 1 1  '.i  I  I'd :  a^  1  iauie  j  '■h<  i\\  s.  f.iiiu  1  1  ei'o\  e|  \  i-,  hi.”  .1  .  y  .■  . ;  h :  u  1  h‘  \  . 


30 


F49620-89-C-0113,  Final  Technical  Report 


Paul  R.  Cohen 


Failure  recovery  repairs  failures  that  arise  <luring  noriual  plauuiiig  and  acting.  FRA  "watches" 
failure  recovery  for  clues  to  bugs  in  the  planner  and  informs  the  designer  of  tlie  l)ugs.  The  de¬ 
signer  can  then  attempt  to  repair  the  plan  knowledge  ba.se  to  prevent  the  failure  from  occurring 
again  and.  using  FR.\.  can  evaluate  whether  the  repair  was  successful. 

Failure  Recovery 
jrPlan  &Act 


Repair 

Failure 


Detect 

Failure 


Repair  planner  or 
failure  recovery 


Analj'ze  failure 
recovery  for 
sources  of  failure 


Failure  Recovery  Analysis 


Figure  1;  Relationship  of  failure  recovery  to  Failure  Reco'.eyv 


1.2  The  Target  Planner  and  its  Environment 

The  Phoenix  'V-umii  \\]  -<'rve-;  a>  'li<-  \  ;iii-  .i,.; . 

•-liown  in  Figure  J.  I’lii»'ni.x  ]iir)'.l(le>  il'e  -.imul.iO'd  <'n\ iioinu'n! .  \  .-n  .t  -'i  <r.  pl,,ii 

knowledge  baso^  for  agoni'.  ;;ii<l  an  '■.\perini“ni.!l  in'orf.ice  foi  am r.. > -^'m i oliing  <'\- 
Iterinieiils  and  collecting  data.  Its  enviroiimeni  i--  fore-.t  fin*  fighting  o!  i .  lioa -nme  Natiunai 
Park. 

The  goal  of  foie-'t  fio'  lighting  is  to  (ontain  (iic>  a-.  <'iiici'‘iit  1\  .••-  ^;i;.>,mi 

in  1 1 1 egnla r  nIki  jx*  - .  at  \ a i  la ifh*  i  .a t **s.  a>  .i  fn net  x  j*,;  of  '_:i on nd  f  <■>'.  *‘r.  •  ■  .a *  :  .  - . -  i  v  \  i m  i-; g 

Wind  sppcd  and  diixrtion.  ami  nalnial  bound. uie.--  (<-.g..  l.odi"a  of  o  :  vm'  i-.mm.  1  i;.  - 

•u  e  coni  allied  1)\  i  (■iiio\  lug  fuil  from  t  lielr  jiat  hs.  causing  i  lii'm  to  1  o; :  a  i  r ,  .s  o:  vi.a-  - .  c.i ,  i 
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ric,iu(?  2;  Dla2,iain  of  the  separate  processes  that  comprise  the  Phoenix  s\  'iem.  Tiie  arcs 
I'otvveeii  tlie  processes  indicate  a  transfer  of  data  between  processes. 

hnJiciing  fireliiie.  leciuiies  the  coordination  of  many  agents  to  snrronmi  tlie  fiip  nmiIi  oi 

imuiral  bonudaries.  One  agent,  the  rirel)Oss.  coordinates  tlie  activities  offi'-'hi  .■■.n's;''-.  hniidM^er^'. 
io  >nrronnd  the  fire  with  iireline.  Other  agents,  watch.towers.  uasoline  cartier-.  ,,ini  helw-nDicis. 
'Upport  the  acii\i:ie,  of  the  fiielio-s  and  biilhiozevs  Ity  yatiiorif.y  inform. i :  ion  .mil  .“i  i  iig 

v.i'oline  to  lefii,.]  m  i!,.'  I'.chl, 

I  leiiie  .i  ^hov. '  ill-’  in !  ei  !()  I  lic  -imnlaior,  1  li*.  in.ip  in  the  I’.npoi  o.iit  o:  ;  ji.-  .ii-jil.iv 
d>'i'K'i  '  'i  el'ov  >t  N  .1 1  lon.d  I’.i  I  K  not  I  li  of 't  el  low  '.tom-  I,.-.  .■.<•.  [''St  1 1;  re-  -r, ii  1 1,(!  -  ■  '"l■(  i  n  nd 

diieciion  are  ^hu^■.  n  in  the  window  in  the  npiier  left,  atid  v‘'oera;)iiic  f-Mtine-  -inh  a,-  I'.'.e].,. 
ioad>  and  terrain  I>l)e^  .ire  '.hown  as  Hglit  line-'  or  crey  -h.'ided  areas,  i'oii;  -  ;,|o 

laiild.ing  fireline  aioniid  .a  l;:e  neat  the  centev  ,,f  jt-.e  fi-jn-e^  _\  i  c:it  o-a  .  ■,  \  .-.i  t;,..  ton 

in-ai  t  he  center. 


\I1  IMlO.'lli.X 

■  -O..  •  ;'e  -.one  aveni  ail  id’.,-.  ’ 

.ind  a  jil.inii'' 

'^"i.-i'i'>  ix'ifi'iie  i]|o  vtaie  of  t|.e  >•:; 

iviionin.-nt 

1 1  1  1  ; ; 

I  OI  ^  cli.uiy<:'  1  l;e  <■ 

.'1'.  ironiiicnt .  'Joyeiher.  i<-‘;e.\..s  and  ti. 

e  phiiiin  1  loriii  I 

1  U'  ti 

l.t\ 
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1  rjMi''  \  •  ':''in  I’lnu-tiix  'inni!;il<i|  oj  l.ii!Ii!<  i/.-i  ^  f,  !i! :  .1 

-'■•''<’111.  l,;i(  li  l.i''"'!  .■  I '.1 1 1 II  11 1. 1 1  |l■v(•l  ol  ri  (III  ji'M  cm  *“  !>'c!ic\c.  ,i,ii;|cv^  lii.i! 

ijrcur  f.'i'UT  ill, 111  ii;c  i"x:  ■  c  .  iiiii|iiniciii  <;iii  |i-|)iiiiil.  .uni  i  ;.<■  |i;,iiiitci  1 1  .i  n  (iiii.ii  c 
;ni(l  .'uxiiiU  ild  ni'ci.iriioim. 

I'liofiiix  pl;iii'  fiiil  fill  .•iis  of  I i';t-.i)iis.  Till'  id\ iroimicii;  <  .o.i  i  li.O'.”-’  I'.iijcciin  i.i 
tld  pl.iiincr  l);i-,c-.  ]il.;;i  oii  -low.  i!  iii’ciciit  or  no  chance  la  env  c  oa  mcni .  i;;c 

fail.  IMuicnix  .'.',''111'  .1 ;  c  anicil  in  tlicir  al)iliiic>  lo  acusc  I'.ic  •■•.o  1; ,  .ai,,...,!  ■  ,,,  i,',,..;. 
I>cc;iii'.>  I  |icy  aic  l,a'c.|  (  ,,ii'(,|.'ic  or  nnci’i  i.'iin  iiiioi  in.ii  ioa .  !'!.ocni\  .'.i-o  :.i; 

I  lic\  iiK  iiiilc  l)ii'j'  lull  liccn  tested  in  all  I'ossihlc  Ml  n.ii  lolls. 
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2  Failure  Recovery 

Tli<?  purpose  of  failure  roroverv  is  to  repair,  as  eincieiilly  as  ]>ossible,  the  plan  so  a.s  to 
resume  progress  toward  the  failed  plan’s  goal.  Several  areas  of  AI  have  developed  automated 
failure  recovery  techniques  for  knowledge  based  systems  (most  notably:  rolrotics  ['J.G.du].  and 
planiiijtg  ['.23,27.29]).  In  essence,  the  difTereut  approaches  all  have  the  same  ba.sic  stei)S.  a.s 
shown  in  Figure  4.  The  system  continues  to  execute  (iplanning  and  acting)  until  it  recognizes 
that  its  actions  are  failing.  Then,  the  system  deals  with  the  failure  by  taking  ‘'Ome  corrective 
action. 


Figure 
!  ''^UUIO' 


;  Flow  of  Colli  o'i  ii. 
afle;  a  f.'illiiie  isi' 


iie.'U  lepaiied. 


I  he  1  \'.d  ba'ic  a  iiiu  "acli*-'  lo  i.ubne  o'co-.i'rv  v,  .i;<;  nuv, ;  ;  J;. 

backward  i  |■'ro\  e;  \ .  :  l,n  .\  vt  a;.,  j.  |  ,.|  uriied  to  '^);r,e  er.'x  t  a  i-  a.  ;a .  i  u  iii''-  >  '-a  >  u ;  .a 

fiom  theio.  liack  w lerorcfa  .'e(juiie<  ih.at  .•cii.a;;'  c.'.i,  .••lai  ‘i..:!  t  :a'  'rst'':ii  ;  ;.i 

rout  rol  ONcr  it  S  eii\  11  niilil.Mit  ;  ;  I  .  ;  i; k,..  ;  ;.  a.,..\  ;  ,|  a;,..;  . . .  . 

.Old  i)laii!iiiig  '■.'.i.'u;-.  lniv.,-.!d  I ..rv  i ;  .'.i. o' d,,;.  -.o.  ■, 

rep.lirillg  I  he  f.iiia; ,  \ .a.  iie.,  l o  w  ■.  ■  .  - 

.1  ppl  Iipi  i.i  I  e  ;  1  ,  ..  i'  !  ■  ;  .  ' 

1  I  u  Ilia!  .Mi.d iai:  •  laF.'e  i;.  i..  a  .  m,;  ■  a  ’ 
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of  failures  and  tlicir  causes.  Time  Petri  net  models,  real-time  logic  and  fault  tree  aiial\sis  are 
common  tecliui<iuos  for  modeling  tlie  causes  of  failures  and  for  guiding  recovery  in  software 
(e.g..  [13. IS]).  Formal  theories  of  planning  and  replanning  have  been  proposed  to  guide  plan 
repair  (e.g..  [11.17]  j. 

Failure  recovery  can  be  treated  as  a  normal  part  of  the  planning  proce.ss.  Lesser's  Function¬ 
ally  .\ccurate.  Cooperative  (F.A./C)  paradigm  for  distributed  problem  solving  [12]  and  .•\mbros- 
Ingerson  et  al.'s  IPEM  [1]  treat  failure  recovery  as  just  another  planning  or  problem  solving 
task. 

Both  recovery  through  formal  analysis  and  as  part  of  planning  require  that  the  planner 
employs  a  strong  model  of  what  to  do  in  any  situation,  including  failures.  Heuristic  approaches 
allow  for  saps  in  knowledge  and  apply  recovery  methods  to  repair  the  failure.  Tvpically.  heuris¬ 
tic  approaches  operate  by  "retrieve-aud-apply"  [20]  which  maps  observed  failures  to  suggested 
resnon-es.  Tlis>  most  compreliensive  and  commonly  used  domain-independent  stra.tegy  i<^pl.'in- 
niitg  (e."..  '1-4.20]).  uhicli  involves  restating  the  planning  prolrlem  and  re-initiatinci  piauniiig. 
Other  doma’'i-iii<!'-p'’iident  recovery  i!K-iliod.->  are  based  on  an  informal  model  of  how  that  jtian- 
ner's  plan'  fail  and  can  l)e  modified  (e.e..  :;{.29]).  Robotics  and  other  areas  that  trent  piauning 
a'  a  viihta-k  ha-. "  fa\oied  domain-d<j.<-iaie|it  i<To\eiy  methods  (e.g.. 

MaiiN  ‘iilfei •■lit  .1  jijdoacli"'  iia-.e  i,«.<-ji  pi o|)<»'»'(l  for  failnf'  t'‘CO\ei\:  most  depenil  eitliei  on 
the  domain  oi  on  i  he  planin'i  m'-ien.  lio.'<.\<i.  tin*  iitei.uuie  olfei'  l-'w  suiognsi  ion-,  ahoiit  liow 
to  desien  f.tilni*'  i'TO\eiy  for  a  m  ir  jilanner  or  domain,  lor  constructing  doinam-d-'pendent 
recoNer'-  piourani'.  .Nof  et  al.  jib]  ])r<qios.e  ;i  four-step  fiamewoik;  .inalvi’e  the  task.  .'.•■\elop 
alternative  jeco-.ery  stiateeies.  d"i<‘i  niine  .a  selection  strategy,  .’ind  update  h.ised  on  e.\j:ei  ;eiic.- 
v.iili  tiie  '\st(.|ii,  Siiiiii.iriv.  ^  .lein'jn'-  adv<>cat<'s  starting  tlie  snsIcui  with  h.isic  .  om;.'';  .mu  ■• 
at  its  -asi,  (j.i-,.  no  f.-.k;iir.  i  er  i;  v  j  .-.nd  then  ndding  execut  ion  nnnntoiing  :i!ri  l.mr,;.' 
i-rs'  nie-hods  as  la-ed'fi  j-’b]  .ttlvocmes  comliining  hoih  doin.un-in.h  ieMen  :it 
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domain-specific  methods;  the  system  can  try  the  more  efFicieut  domain-specific  methods  when 
they  are  available,  but  fall  back  on  the  domain-independent,  when  necessary. 

2.1  Designing  Failure  Recovery  for  Phoenix 

Most  of  the  previously  mentioned  approaches  to  failure  recovery  cla.ssifv  the  failure  and 
select  from  a  set  of  methods  for  adapting  the  plan  in  progress.  The  approaches  differ  in  their 
classification  of  failures  and  their  recovery  methods:  so  how  docs  a  designer  decide  on  a  failure 
classification  and  a  set  of  recovery  methods  for  a  new  domain?  The  approach  adopted  in  this 
research  is  to  begin  with  a  fle.xible  method  selection  mechanism  and  a  core  set  of  reco\er>'  meth¬ 
ods  and  then  refine  the  set  by  evaluating  failure  recovery  performance  in  the  host  enviroitiiicnt . 
This  section  will  describe  the  basic  design  of  failure  recovery  for  the  Phoeni.x  ])launer  and  th-:> 
ntetltodolog,v  that  directed  the  re-design  and  evaluation  of  that  failure  recovery  coniitoneiu. 

Design  of  Failure  Recovery:  In  Phoenix,  failures  can  lie  detected  during  construction 
of  1  lie  plan  or  during  its  execution,  and  can  lie  due  to  anything  from  rapiil  cliange  in  tiie 
^in'ironnient  to  bugs  in  the  l)lall^.  .\t  |)ie'-ent.  the  Phoenix  fireboss  detects  ten  t\  ])-".  of  I'ad.uii'v. 
and  Imlldozers  detect  fi\e  t'pes.  as  sliown  in  Talile  1,  'lie*  clas-iiirat  ion-  of  f.uiu:"  ivp.'-  ;u'’ 
laiuely  domain-de])endi'nt .  r.'iilures  are  cla'-'-ifl'nl  l)y  wdi.ai  i-  kiuesn  about  what  .'iock'-d 

tiie  platt  frotit  cottt  ittuitig  ^itcfe.,sfttllv;  the  actual  cause  of  ilie  f,iibtt<'  is  unktiown. 

In  Plioettix.  failitie  reco\ery  is  ittiti.ited  in  |•es]^ottse  to  <letecittig  a  f.dlute,  .tti  'W.mh  il,,,; 
jtt  eclitdes  sttccossfitl  cotitithu  ion  of  sottn*  |)lan.  1  ailut  e  tern  very  it  in  at  i\el  v  t  ties  t  ec  o\  i-i i  j., s 

until  Otie  Wot  ks.  at  which  Jioilit  the  phot  is  tesnmed.  !  or  e\ a  itt'pie.  I'inuie  dennts  ;  ;  ' ‘f 

f  I'llit  rf)l  between  f.iilme  iiTo'.er\'  attd  till'  test  of  the  pl.OI  for  .an  tnsi, k/ 
t.iiiute  d.ni'ct  :on  iiiei  ii.im- m  d'-tin-tititii'd  that  tliepl.oi  i-.  I..king  too  loiiv  to  i  omp',.'.' >.  : 

i.i\  deals  with  this  failute  liy  searchitig  a  libtarv  of  teioseiy  tuetl.od-,  foi  tho-i'  .ip  ■  i.  ...  .. 
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Table  1:  List  of  Failure  Types  for  the  Phoenix  Fireboss  and  Bulldozers 


Agent  Failure  Types 

Fireboss 

(CCP)  Can’t  Calculate  Path 

(CCV)  Can't  Calculate  Variable 

(CFP)  Can’t  Find  Plan 

(FNE)  Fire  Not  Encircled  when  it  .should  be 

(IP)  Insufficient  Progress  to  contain  the  fire 

(NER)  Not  Enough  Resources  to  contain  the  fire 

(NRS)  No  Remaining  fireline  Segments  to  build 

(PRJ)  Can't  Calculate  Projection  of  fire 

(PTR)  Can't  Calculate  Path  to  Road 

(RU)  Resource  Unavailable  (RU) 

Bulidozer 

(CCP)  Can't  Calculate  Path 
(OOP)  Deadly  Object  in  Path 
(X\‘\')  No  \'ariable  Value 
(OOF)  Out  Of  Fuel 
(PM)  Position  Mismatch 

to  the  failure  ty])p.  I;  selects  one  inotliod  from  the  possible  set  and  executes  it.  if  the  recovery 
method  succeeds,  then  failure  recovery  comidetes  and  the  rest  of  the  plan  executes;  othorwi'-e.  it 
aliandons  tlie  fuir'’ni  aiiomut.  selects  another  method  and  tries  acain.  Tliis  inocess  continues 
until  a  metho<l  sucffs'd-  or  it  lutis  out  of  methods  to  try. 


1  I'Aure  .\  hsi  i  ;,i  i  <  /; 


ihiw  of  font  rol  hi'iwcun  f.illuii'  otom  r.  !,.i-  :  ■ 


'I'll*'  !<•(  OV'U  '.  l:;-  ' 


m.i'r.e  most ly  sim))lc  i ejt.'iirs  to  t h<‘  st  nirt  ui ■  U  i  f... i',.  . ;  ; li  ;  r  ,  <  '. 
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Tal)Ip  2:  Spt  of  Recovery  Methods  for  Phoenix 


.Method 

Descriiition 

WATA 

XVait  and  try  the  failed  action  again. 

RV 

Re-calculate  one  variable  used  in  the  failed  action. 

RA 

Re-calculate  all  variables  used  in  the  failed  action. 

SA 

Substitute  a  similar  plan  step  for  the  failed  action. 

RP 

.\bort  the  current  plan  and  re-plan  at  the  parent  level  (i.e..  the  level  in  the  plan 

immediately  above  this  one). 

RT 

-Abort  current  plan  and  re-plan  at  the  top  level  (i.e..  redo  the  entire  idan  ). 

queiitly.  tliese  methods  can  he  used  in  difTereiit  situations,  do  not  recjiiiie  e.\[)eil^ive  explanrition. 
and  provide  a  response  to  any  failure.  This  strategy  sacrifices  efficiency  for  generality  and  re¬ 
sults  in  a  planner  capalile  of  responding  to  most  failures,  but  perhaps  in  a  less  than  optimal 
iiiaiuier.  The  six  methods  in  the  basic  library  are  listed  in  Table  2. 

The  first  four  methods  make  changes  local  to  the  failed  action  and  .surrounding  actions:  the 
last  two  rejilan  at  either  the  next  h.igher  level  of  plan  abstraction  or  at  the  toil  '.ev.’i.  .'■.!!  of 
these  methods  make  .-truct urtd  changes  to  plans  iti  progress  and  are  antihcai)!'^  to  .li! 

failures.  Theso  recovery  meiliods.  te;  ones  very  like  them.  Imve  Ije.m  n-'si  in  '.e';..;  i  .<  m'.-.:;  \ 
'■>'steiiis.  is  ^intilal  to  tiio  "letry''  method  de'-rribe<l  in  .  KX’  and  M  .aie  I’i, 

specific  forms  of  silMt's  liein-t.oii  mte  ■.’m.  SA  i'  similar  to  a.  diti-u  n 

[2 1 !  and  act  ion  suIjsI  it  ui  ion  it-' si  .u  iiel)mru,imi  in  <.oi!  oil  s  '  j  t  .  1  a.'  1  .  >  :  .  ii'..-, p  1 
.and  III',  .'lie  const  r-uned  foims  .>•,  1  ii.'  nnire  gemual  i*-))ianmnu  'lo.'ic  n.  : 
systems. 

2.2  Evaluation:  Tailoring  Failure  Recovery  to  Plioenix 

I  .tilure  1  eroiery  V,  as  .1.  li'.ci  1  I  (I  I  !i.' i  Mii'.’ni.x  ]d.'i  nnei  ier  u;  .1. !  u.i  ii iuo.i':':.:;  ■  "i"  •  ■: 

Ilf  f.illUie',  .-.nil  - - 1 1-; ;  1  a  t  i  ‘  .1 , ,  \  I’.i'J,  la’W  reco\.‘rv  Un*' i;. ‘i;-.  Id  .dlil;.--  W- 

ni.’iiioi!'  .1  ,  ilii'y  ',',i'ie  |  ii..  fev  to  t  h*‘  d.e-i.jn  pi.j..’.. 
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effect  of  new  methods  by  evaluating  the  performance  of  the  entire  set. 

The  methodolog.v  was  to  define  performance  in  terms  of  an  expected  cost  model  and  use 
the  model  to  direct  improvements  to  the  design.  The  expected  cost  model  accumulates  the 
cost  of  trying  enough  methods  to  repair  the  failure:  this  amounts  to  the  cost  of  trying  the  first 
method  plus  the  cost  of  trying  a  second  method  if  the  first  method  fails  jdus  the  cost  of  trying 
the  third  and  so  on  until  no  methods  remain.  When  no  methods  remain,  then  the  cost  i^>  the 
cost  of  outright  failure.  This  combination  of  costs  are  captured  in  the  following  equation: 

C(  5-  )  =  C(  .l/o )  +  ( 1  -  P(  il/.  |5-i ) )(C(  Ml )  +  -( 1  -  P(  -Uy  1-^, )  )[C'(  M, )  + 

(1-P(M,\S,])[Cf]]...]  (1) 

whore  C(S, )  is  the  expected  cost  of  recovery  for  failure  5,:  C{Ma)  is  tlie  cost  of  employing  an 
ni)plical)le  method  a:  C>  is  the  cost  of  failing  to  recover:  and  PiMPS,)  is  the  probability  of 
method  0  succeeding  in  failure  5,.  Cost  is  measured  in  seconds  of  ;on  time  recimteii  to 

repair  a  plan  using  the  m'eth.od. 

Tlu*  performance  of  falluio  recovery  was  assessed  in  a  'orie^  of  three  e.xperinmnts  [lO  .  T  in' 
III  '1  <'X])<’i  iiuent  1o^md  t  he  a"mni)t  ions  upon  w  hich  t  he  model  is  ba'-^d  a;.'!  <lei  jved  ]>ei  to;  iii.i  m  ■■ 
lia-i'liie’s  for  failure  iero\er\  in  'he  Plioenix  onviionment.  Tlie  -econd  rom:)aied  two  aptiKnu  h*  ' 
to  'elcfiiiin  locovery  nieiiioiis;  choosing  raudontiy  and  choosine  ba-er  f,;,  a  -tiateo.  deiM'si 
from  the  model.  The  third  experiment  compaied  two  sets  of  rero'.erv  mftjiods:  the  oiiemal 
set  and  the  same  set  with  two  new  domain  sjiecific  recovery  method'  atided.  n.ich  of  tlm 
ex)n'riments  collected  data  on  the  faihire  recoverv  of  both  firebo"  and  b;'..  hj/er  agents,  [!.■(-. ms.' 
tlie  Itnlldozers  do  far  less  jilanning  llian  the  fireboss.  the  biiihlo.'vi  t''-':.’s  o-nd  tn  i  e 
bn!  Jess  interesting.  ( 'oji'eniieni Jy.  otili’  tJie  resitlis  of  l)ie  iiteiiO's  :•  nt'-d. 
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2.2.1  Experiment  1:  Baselines  for  Performance 

This  ox'5erinieiit  fathered  baselines  for  the  parameters  in  the  performance  model  and  tested 
the  assumptions  of  tliat  model.  The  model  is  based  on  three  assumptions; 

1.  C{.\/,„)  is  independent  of  the  order  of  execution  of  the  recovery  methods. 
Having  tried  one  recovery  method  which  failed  should  not  cause  other  methods  to  l)e 
more  or  less  expensive  to  execute. 

2.  P(M,„\S,)  is  independent  of  the  order  of  execution  of  the  recovery  methods.  If 
this  assninittion  is  true,  then  whetlier  a  recovery  method  i.s  tried  after  another  fail>  should 
have  no  effect  on  whether  the  new  methods  succeeds. 

3.  The  cost  of  each  method  C(Mm]  is  independent  of  the  situation  Bcc.-tuse  tlie 
recovery  methods  are  designed  to  be  domain-independent,  intiiinon  sncfests  that  their 
costs  may  Ije  iudeiiendent  of  when  they  are  used. 

This  f’vpoiimciit  consisted  of  IIU  trials,  in  which  Phoenix  font;;;  t!i;e.,>  fje.  m.  ;  i 
of  GO  simulation  houis.  le'iilting  in  2  H>2  faihii<>  situations  aitd  oiupt  s  t,,  fcn\.u 

the  failuies.  I'i,.’  thieo  lire..  \m>io  s(>t  at  iittei-.a!-  <>{  ei.^nt  siuuii.it  lio’,::-.  A  me  i  m,,  ■  .'ie 
once  ail  ienii.  the  ''.jud  .111(1  <!iie(i|oii  \;,j|e(i  hpi;  .oei  r  |  y,. 

experiment  (dli'it'si  ilaia  on  vh.il  f.itliii(‘-  tit'tiried.  \'iiat  ;e.(,\..;\  .,e, 

Older  in  v  diicli  i  eco\ erv  met  hods  were  1 1  ied.  and  tin’  rosi  in  -im  i n.  n.o  n  1 1 1,._^  :  i 

r<'co\erv  methods.  I  he  agents  were  given  f.ailnie  recoveix  inetl.od'  .a-  (fe-r;,'w.,;  n,  s,.|  ii,in  -y  ; 
Bulldozer  .'tgenis  were  eixeii  an  additional  meilnxl  for  eettinc.  o-.u  of  the  ■'  of .  tn  ;.m(  i.mv  h.ie. 
Hecox  ei  >  n n't  hods  wet  e  schn-i (.,]  i  an<:. nnlx'  xvii  Innit  i  epi.tceni'-i.t  f  i;  •  ,.(  i;  : ,t  .  ; ;  o  : ■  n  .-d  >  ;  ;  . . 
one  ('xccjit  ton  Geme  ih.ii  i  ■■  l.:i[nre  t  x'pe.  inenf  2  icid-TitmT  rc.'r-i'rc.  ' 
oi  n*  (if  tin'  1 '.'.'( 1  ;  ‘  ’I  ij  a  n  ■  i  j  ^  nl  s  ) . 
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Statistical  tests  on  (lie  data  (ANO\’As  for  assumptions  one  and  three  and  clii-sqnare  tests 
for  assumption  two)  sliovved  that  tlie  assumptions  held  for  a  subset  of  the  methods.  In  partic¬ 
ular.  the  performance  of  the  two  replan  methods  was  sensitive  to  whether  the  replans  follow 
other  methods  (assumptions  one  and  two)  and  to  the  failure  situation  in  which  they  are  ap¬ 
plied  (assumption  three);  the  four  remaining  methods  were  insensitive  to  failure  context  and 
order  of  application.  Hence,  the  assuiu])tions  held  for  the  four  methods  that  make  constrained 
modifications  local  to  the  failed  action,  but  not  for  the  two  methods  that  make  more  sweeping 
modifications. 

2,2.2  Experiment  2:  Selecting  Recovery  Methods 

Failure  recovery,  as  implemented  for  experiment  1.  selected  recovery  metliods  at  random 
without  replacement  to  repair  each  failure.  Given  the  model  jrresenteu  in  e(|uation  1  and 
the  values  for  and  P[M,n\S,)  gathered  in  experiment  1.  we  can  uuide  the  -election  of 

reco\-er\'  methods  to  minimize  ccr-t.  Simon  and  Kadane  [2-}]  showed  that,  for  piohhuns  o;  the 
tvjH'  (io'crihed  hy  ecpiatiou  1.  th'>  expected  cost  of  e.xecuting  a  setpience  i-  minimized  by  the 
-iiat-’eN'  of  ti  vimr  the  m<'th<i(i-  in  decrea-ine.  order  of 

F'M.  'S.) 

fi.U.,  j 

which  iutuitirelv  me.tns  "-elect  the  ine'liod  that  is  must  likely  lo-ucf-'ed  with  ih‘‘  lowest  ro-t'  . 

W’e  abided  tlie  selectiott  strategy  to  failttre  reco\eiy  for  Plioenix  tiem  'e-iati  tie'  -time 
e\pe:iii,eut  scenario  as  iit  evpeiinieiit  1.  Ex;)eri;i;ent  2  incltnied  !") !  iiial.  which  tesulte.l  m  JObl 
failute  '■ituatioiis  atid  •'’.''77  .•.tt.'m[)ts  to  teror'.-;  from  th*'  failtties.  1  :  '(.o-  ol  leinM  iiee 

e.'M  ;;  f.'.iiuie  t\[,e  III  this  <  ■  \  j  e 'I  I  111  e  H  t  ue;e  c  i! •  j,-, ;  cd  to  ic  le-ults  g-eoi  •  ■  \ ’ll  1 : ! 'U 1 1  i.  l.d’;.’ 

shows  'lie  mean  (o-ts  of  failure  lecoc'-iw  for  •  .;cii  f.iiitiie  tvpe  .iiid  tie'  lei  i  I'ui  .e."  "\i  '..-.i,  lor 
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Table  3;  Fireboss  failures  in  the  Baseline  and  Stratcgv'  experiments. 


I  cep 

cev 

cfp 

fne 

ip 

ner 

nrs 

prj 

— 

p^r 

ru 

Exp.  1  Costs 
(Random  Strategy) 

! 

!  1932 

n 

B9 

^^9 

IBM 

HI 

2165 

1373 

3234 

Exp.  1  P(5,) 

j  .20, x 

hhbm 

BS 

IH 

.0X0 

.073 

.077 

Exp.  2  Costs 
(Selection  Strategy) 

j 

1  10.36 

2707 

m 

414 

m 

1041 

2314 

Exp.  2  P(.S,) 

!  .212 

.0.X4 

HI 

.004 

9BSi 

.043 

.0.3.x 

each  failure  type',  for  experimeius  1  and  2.  The  mean  recovery  cost  for  the  fireboss  was  2943 
for  Experiment  1  (sd  =  303.'?.  ;i  =  10.33)  and  2.300  for  E.xperiinent  2  (sil  =  4024.  v  =  102'j).  .A. 
z-test  on  the  difTereuces  between  the  mean  recovery  costs  for  the  fireboss  in  the  two  exiieriments 
yielded  a  siaiiificant  re.suit  (:  =  — 2.S3.p<  .0023).  Because  not  all  of  the  a.ssuinjitioiis  of  the 
model  held,  the  selection  stratfqy  is  not  guaranteed  to  be  optimal,  but  based  on  these  lO'iilt.s. 
it  ajipears  to  significantly  rs-duco  the  overall  cost  of  failure  recovery. 

2.2.3  E.xperiinent  3;  Tailoring  the  Method  Set 

The  '<■!  of  ;.’(0\.''',  ■i'’'Ci ilx'd  in  S<'ciioii  2.1  ai'-  ■i-'in'i,.,)  .mil  intenii’-i;  he  in 

iihniliinu  <'ii\ i;  ■•:i' -  I  ;ii^  imciit  l<-i<<<|  uh'‘ih'‘i  we  rouh!  fiiilli'U  impjo',.'  ■>>•!- 
forniaiKc  by  p|,.y;i...l'  'pxrificaliN'  for  ih<i'<-  f.iihn-'v  ili.n  wpk' 

inadeipiaiely  I;'.  ijix  1  v,o  new  iiH'ihod^  were  adth’d  for  <‘.tch  of  ih'‘  .•.uxn:^. 

methods  \\<’ie  ;,j;-  failnlf‘^  tliai  were  both  xxp<Mi-.i\e  ami  fie(nieml\-  ornn  i  iipj.;  rr.  rrj 

and  nor  for  the  fii^.p,..,  J'  1,,,^,  ne-ihod^  wero  based  on  exi^-iing  metinids  and  w.  ;.' 
to  Phoxiiix. 

I  o!  11, i'  ,,f  i’ho,..|iix  \\o|(>  mn  Imillg  the  sann- 

i  !.'■  l.ti  i .e  f'  i.'-  ...  ■ .  .  -  .  i j  j-j  ( jiliie...  <liu..  al  P-.ivt  m  p.u  1 .  i  o  i  li,*  lui  i , 1  m  1 1.  .  1  ...  _■  >  ,  i;. 

niiuA  I’l:.-  ...i,.,  t,.)  |,;.j 
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as  in  tlie  previous  experiments;  failure  recovery  incorporated  the  selection  strate"v  used  in 
experiment  2  with  the  two  new  methods  added.  These  trials  resulted  in  1540  failure  situations 
and  4279  attempts  to  recover  from  the  failures.  .4s  Figure  G  shows,  the  overall  costs  of  failure 
recovery  for  the  fireltoss  decreased  in  the  three  failure  situations  addressed  by  the  two  new 
methods,  but  the  costs  innxafied  in  all  but  one  of  the  other  failure  situations.  Because  the 
three  targeted  failures  occurred  frequently,  this  resulted  in  a  reduction  of  the  mean  cost  over 
all  failure  situations  (from  2500  to  2-355).  but  the  reductioji  was  not  siatisticallv  significant. 

600 


CO 
—  "O 
<0  C 

o  o 

O  g  -200 

<  CO 

_C  -400 
-600 
--800 
-1000 

fi  "lire  0;  ('o>i  ciiane.o-^  from  Exiieriniem  2  i<«  E'-n-Tiiinnii  3. 

2.3  Summary  of  Failure  Recovery 

The  methodologv  followed  for  inijilenienling  failure  lecovcry  wa^  to  fon-irnct  a  ha-ir  vct  ,,f 
general  recovery  methoiK  and  a  model  of  expected  cost  and  tiem  i  in  cif  i'Xji'M  nm'ni -■  i.) 

I  e^t  a.^'.uni  pi  ions  alnnit  pm  foniia  nee  a  nd  e.xiiect  at  ions  of  how  u  i w:  O'  '■  i a.',  i MS'* ,  v ;  .u .  n  .i  ;i\ 

I  ei'. II ill"  laiinre  recoN  cry  lo  address  i  he  ii'ipiireimuit  s  <.if  t  lie  en\  i:  ( .  iia-cd  on  !  Ic  i  .'.i;,; .  ( a' 
the  Ha.'-ciine  expei imeiit -i.  ii  aiiiicnis  that  .'iii  untuned  s<‘t  of  ii;''!'noi...  pcifouns  i c.iscpiiahK-  wi  il 
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at  loast  in  Phoenix,  and  the  luothod  set  can  be  tuned,  within  limits,  to  suit  the  enviroiiment 
better.  The  ad\'autage  of  constructing  failure  recovery  iteratively  using  a  model  is  that  the 
system  achieves  a  ba.sic  level  ol  perlonnance  quickly  and  at  each  stage,  both  the  failure  recovery 
implementation  and  our  understanding  of  what  works  improved. 

In  terms  of  constructing  failure  recovery  for  some  other  environment  and  planner,  the  most 
important  result  from  these  ex[)eriments  is  the  insensitivity  of  certain  jirojierties  of  some  general 
methods  (the  local  methods)  to  aspects  of  their  execution  context:  cost  is  independent  of 
’posit  ion  and  failure  situation,  and  probability  of  success  for  a  situation  is  independent  of  position 
of  execn.tion.  The  result  is  important  both  for  design  and  methodological  considerations.  .\s 
desiciners.  we  V.ttow  that  if  such  independence  assumptions  hold  in  other  ])lanuers.  then  ne  can 
use  a  <;n;iiar  o.derittg  strateg>'  and  predict  performance  for  these  local  methods.  Furthermore, 
eiveii  :l,e  met  ;.odolog.v  of  directing  evaluation  using  a  model,  we  can  state  and  test  some  of  the 
a^^i•.;lt’,)’ ions  o;  our  designs. 

The  implicr-.tion  of  the  experiment  results  for  designer'  of  failure  recovery  is  that  it  '■an  be 
••xf  •liff-.cui;  locoiiiiol  tlio  (dfiTt '  of  e\on  ^mall  chanee'  to  failure  recoverv.  For  exanijtle. 

i!  ■  •iio,;.  adih’d  in  fcox oi ine  from  ilii's'  !yin>'  of  failui'".  't'.u,  i  ( 

a. Hi  H  •(" !  H,  i  ci);i'f>(|u<'n<  H.'  t';,.-  and  frecpu’iicv  of  f.olui<"  ('handl'd  ('oun'  inci  i.M'i'd . 

ill,.  Hini'j,  fioiu  f.iilurHs  not  addr'>"H(i  li\  ih..  ii,.\v  i,.(n\Hi\ 

,■  ri<,  The  incni.'iice  of  ^.tiln^<■^  clian<;ed  lierau'e  the  mov  I'Toxeiw  method'  who' 
pie\,.;;i  :;;H  'O'.i'.e  f.-.iluies.  caiisiuu  oth*u>.  and  permitting  'Otiie  plans  to  uei  fuitliHi  .ahuig  iiefoic 
tli'^'.  n.'-'aU'H  the  i('co\'Hiy  nieiliods  make  mo'tlv  'yntaciic  ^'haneH^  to  the  id.ui  l.it-Hil  lea 

i!.'’  rriiie;.’  cf  t'.e  .|l;,ii  lihiar'..  we  'U'piHted  that  it  was  not  the  recovi'i  \  method'  t  Iihi'I'.I'..' 
I.'  :-.:u  t.,e  ht)i  ;.-,;i,H|  jiari-  'd  tin*  ]'Ian  liluxirv,  Fiuiu  thi'  liituition  g,,, 

.  .  'H'-ciilied  in  ih.'  le'Xl  'eriion. 
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3  Debugging  the  Planner 

Failure  recovery  is  an  auproach  for  dealing  with  the  unexpected  source  of  failure.  Typically, 
we  prefer  to  avoid  failures  rather  than  patch  up  the  plans  after  the  failures.  Some  failures 
are  difficult  to  avoid  because  the  environment  is  capricious  or  the  contingencies  are  simply  too 
expensive.  The  failures  that  seem  most  amenable  to  avoiding  (i.e..  those  over  which  we  liave 
the  most  control)  are  those  caused  by  bugs  in  the  planner  itself:  debugging  the  planner  removes 
such  .sources  of  failure. 

Most  .-VI  approaches  to  debugging  planners  are  knowledge-intensive.  Sussmau's  h  .\c  i;er  [27] 
detects,  classifies  and  repairs  bugs  in  Blocksworld  plans,  but  it  requires  considerable  knowledge 
al)out  its  domain.  Hammond's  CHEF  [7]  backchains  from  failure  to  the  states  that  caused  it. 
applying  causal  rules  that  descrilte  tiie  effects  of  actions  and  maps  the  ittformation  to  ranon- 
ica]  failures  atid  repair  strategies.  Simmons'  GOUDUS  [23]  debugs  fatilty  plan-'  by  i>-e;‘'"inc 
desired  effects  through  a  causal  (le])etideucy  structure  constructed  during  plan  g'uu.i at io;;  from 
a  causal  model  of  the  domain.  Kambhampati's  apirroach  [llj  ie(|uires  the  plaiiin'r  to  ate 
validation  structutes.  explanaiimi'  <if  correctne-^s  for  tlio  plan.  Uis  tlieoiw  of  jilan  mo'i::':r,;- ion 
rom])ar'’''  the  validation  'iiiirtuic  to  i  lie  planning,  -it  iiai  ioii.  diuorts  ))iroi|.-i..i  em  n-- ,  .o:'; 
tlie  validation  .structujf'  to  uunh'  'If  i"pair  of  the  phui. 

The  aititioache"  de'Crilfd  -i  lai  are  coinprehen-’ive;  tliev  ;..ll  tuiaUze  a  pian  toi  ■  n.e  ■  nt 
wrong  and  rejtair  the  plan  or  model  to  avoid  the  faiiuie  in  the  futuie.  'I  hii-.  'le  v  ail  ,.--i;m.> 
that  the  aualvvi>  merhaui'in  po^^e^^es  a  complete  tinui'd  ()f  the  domain  aml/oi  pl.'.n;  •  .tlt-'i- 
natively.  the  debugger  could  -olicit  outside  help  or  iitformat imt  b\  a-.kimj,  .a  loim.oi  ':-'  i  "i  i.\ 
iiifori ing  cibtaiiK'd  fiom  f.iiluie  lecovt-ry  e.Nectition.  llioverm.'nt  i.t]  vii-w^  f.iilup'  i'-''  ’  :-.  ,  - 
op])oi  t  unity  j;  ktmv.  li'dvi’  at  (|iii>ii  ion  and  retpiONi*.  a-.-.i-.t  ancf  ficim  t  h<‘  hum.i  n  u  -  •  ;  •  n 

iit'r  to  tiuuii'.eut  t  ha  system  s  inodt'l.  Zito-Wolf  and  .Mtermasi  atiajit  jtlaiis  when 
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plans  fail  [30];  those  adaptations  repair  failures  and  then  are  used  to  the  plan  storerl 

in  long-term  memory  with  clioice  [toints  and  alternative  actions.  In  tiiese  approaciies.  we  mav 
sacrifice  comj)leteness.  or  degree  of  automation,  to  irroaden  tlie  types  of  failures  a<ldressed. 

3.1  Failure  Recovery  Analysis:  An  Approach  to  Debugging  tiie  Plan  Knowl¬ 
edge  Base 

Failure  Recovery  .\nalysis  exploits  information  about  obf-erved  rela:io:;s;:ips  between  failures 
and  repairs  to  help  the  designer  discover  how  the  planner  or  failure  r^covei  v  mav  be  causing 
failures.  Because  it  relies  irriinarilv  on  execution  data.  FR.\  retpiires  Lrtie  k;;ov.  ledue  to  i<ieutif\- 
contrilmtors  to  failures  and  only  a  weak  model  to  explain  how  th.e  luanner  miu’ut  l.ave  caused 
failures.  Coiuiilementary  to  t he  more  knowledge-intensive  aitproaclies.  FR.\  mo  -t  apuronriate 
when  a  rich  domain  model  is  not  available  or  when  the  oxist:;;g  mode;  uiic;::  i'e  incorrect  cu' 
Irnggy.  as  v\  hen  the  system  is  under  dewlopiurnt. 

FR.k  is  a  panially  autoni.'tted  procedure  that  is  coiu rol’.-'d  ■•,y  a  d;--: 

than  a  fullv  aniomated  sysicin.  Tie' de-inner  participates  .at  s;.--,  ;i;<iC'--s_  lirciiii;.,.,; 

where  1(1  fiicu-  .■lleiitioii  ;,)|(1  i;  ;i  ;iuat  I'U'  !io\\  Ui  fix  tin-  Jilanne;  y;;  y  1; 

power  f(i;  m'liei  ai,i;, .  H.'caii-.-  i;  n--  a  ^^e.,|.  mode!,  i'  c.in  hesd.-e  a  .o  ■; 

be  apluopi  ill  (•  f.  II  111,1  Ii\  pi. ;  1  |-  c;..  midl  ”  !ia:.l,'i:  ee  :  ,  ■  ■  .  ,  ;  ,  ,  1  ■ 

expi.'iiii  I  lie  f.tiln  I  c- . 

In  ri!.\.  plan  debug  gine  piDceed-  as  an  iterative  it'iii'-ii:;;  ■  -  .  .  ■  ,, 

tests  tlie  design,  locates  f.-.v.-.  .(ud  modifies  the  planner  to  :e;;,.  •  ..  ■  .  ...  ; 

Ilie  prores^  conlinuis  iimil  i  In- i';..  'j.sne;  is  salisp.i.d  i';.,i  ; 

St  op  1 :  U,u  n  1  ’  la  n  Her  i  :  in  ;.e;  lo  ’  -  by  ;  ^  .  .  ■ 

<  (illeri  ill'.;,  evecu:  i(  111  1 1  .ii  es  of  v.  i,al  f.dii.'.ies  m  ci;;  i  cii  and  I.m.  d  ■ 
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Fi"ui'e  7:  Faihii'e  Focovcrx'  Analysis:  Tlio  Designer  iierativ<"iy  debugs  ihe  nlainiei  !>^ 
the  planner,  locaiing  f.aws  and  n.'.odifying  the  planner  to  i<>n)ove  the  flaws. 

this  was  jiart  of  the  Infonnation  rodccts-d  in  the  experinienis  (iescril)ed  in  ‘secinm  d.J.i  .\n  <.•! 
riiti'iii  liarf  i-  an  aiwivacwsl  \  i<  ’a  of  ihe  interactions  l)eiv  <'..n  il,(>  planner  and  it'  lur.  nonineni 


For  jiurpo'*’'  of  .-ti  \  i*’\v  t  lie  ops'iai  ion  ■  •(  (d.i liner  in  ii  s  eio  iMinnn'ni  as  ,i  '"i  a' 


Ilf  pi. Ml  failmes  I'olliiv, <si  li\ 

•  ■  \  ai  l  ions  i 
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jil. inner,  as  i,. 

1  lie  h  'Ih  IN'.  M,'_'  i 

. . ■ 

-  - 

- 

w  heji'  /  '  a  1  e  eii\ ii o'.iiiieiit  'i.-. 

u  -  .iiid  II  s  ,-ir<> 

actions,  lie 

'  siih^nipt s  iiu 

lit  . It. ‘  ll.diX  ;,:ii 

ill''  I  .  '  • 
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FtT. 

R,P 

R~. 

>10  G-i:5 

Taljle  -1:  Contingency  Table  for  [T?,,.. 


^e(nieat  failures.  Dependencies  tell  the  designer  bow  the  recover>'  actions  influence  the  failures 
that  occur  and  how  one  failure  influences  another. 

Dependencies  can  be  detected  In-  statistically  analyzing  the  execution  traces.  The  >tatistical 
analysis  is  a  two-step  process;  Combinations  of  failures  and  actions  are  first  tested  for  n  nether 
they  are  more  or  less  likely  to  be  followed  by  each  of  the  possible  failures.  Then  the  significant 
combinations  are  compared  to  remove  overlapping  combinations. 

For  e.vaiuple.  to  determine  wliether  is  related  to  R^p-  "e  compare  the  incidenc*''  of 
after  to  F,,,  after  any  action  otiier  than  F,.,..  To  fe^t  the  vig’iiflc/mce  of  I'ne  lii.'iVie.ure 
beiwetui  tlie  iiicidenres.  we  ttpjilv  a  statistical  test  (the  C-Test  which  is  simile.r  to  the  \'  ti.‘'i  i 
i<i  a  2:<2  com ineency  table  of  the  r<nints  of  F,,,  after  R^j,.  slater  other  than  F,.  after  h\  ..  F,  , 
after  an  io!)- ot  iiei  1  han  F,,.- am!  en\ iioimient  st  at('s  ot  her  i  han  .iftei  ariioti- ot  hei 
We  rolioi'i  the-.,  f]  .'i|iiem  i.’s  fioiii  liiativ  e\...inioii  I  r.ice-  .and  .ii.'.itere  ji:  :!,e  eomi:  eei..  \ 

’  a  I  lie  ■  1  1  ,1 1  ,ii.  ; ,  !  he  Ill]  n  iliei  -  1  ii  I  h I  ,il)h>  (  I  a  keti  It  out  1 1|<.  (h, !  .1  ; :  om  !.  x  a.  e;,;  !  •  i.-  :  ; !  i.  •.  i 
in  1  he  Sect  loll  J.J  I  -how  t  h.it  /  ,,  folioV.-  /i\.,  fai  m>)l  e  fl  ..(pient  i'.  t-i.-nil  f  \  -  ot 
arlioii',  .\  (i-ie-t  on  this  talile  \iehls  (i  =  12. ').;»<  .001.  wliicii  su-xe*"!-  ti.at  iii'.tnv 

itible  ill  [  ieitie  j  1-  extrem.’ly  unlike!',  to  ha\e  ai i-i'ii  1)\  clianc.>  i;  /,\..  .iml  /  .  i 

vto  V.e  fom  lmie  liiai  l\^  ilep.  n.U  ,,1,  (abbievi,,ied  [R,^.. 

'  .;ii  I.  -1  I  ni  -  om*'  j  a  il  u  1  e  .1  In  1  ti  ii’.'  p ;  •  :  1  -  "1  .  1.  ;•  o  i 


t  e.  o'.  1 1,  ,i|.  (  /,'  I  r  111  111 '-s  (  /  )  or  jiaii  -  ol  .i  l.o;  n;  .■  ..  n.,1  t  i 

I  ej),!ii  ed  :  I  I  /  /;  j  ! ,  .  I .  ...I  f,  ,1  ili'pe||(ici|(  ies  in  all  line.'  i  ■,  ji.--  o!  p :  e.  m 
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of  overlap:  we  observe  depeiuleiicies  involving  both  a  i)ailicular  aclion  itself  (e.g..  and 
the  action  in  combination  with  some  failure  (e.g.,  FjRa)-  To  distinguish  whether  tlie  action 
itself  or  the  combination  best  desriil)e  the  relationship,  we  can  apply  a  variant  of  the  G'-tesi  to 
determine  the  contribution  of  each  combination  to  the  effect  of  the  action  it.self.  In  tliis  way. 
we  can  reduce  a  large  set  of  overlapping  dependencies  to  a  smaller  set  of  mutually  e.xclusive 
dependencies. 

Step  3:  Identify  Suspect  Plan  Structures  The  designer  selects  one  of  the  dependencies 
for  further  attention  and  tries  to  determine  how  the  planner's  actions  might  have  related  to 
the  observed  dependency.  The  dependencies  are  mapited  to  actions  in  ])lans  and  then  tiie  pia::5 
are  searched  for  structures  that  iiuolve  the  actions  and  tliat  are  known  to  Ire  susceptib:e  to 
failure.  These  structures  are  called  sugtjt f-tirc  ■^trucluixf^  because  they  are  suggestive  of  possjo.e 
bugs;  thev  are  language  mechani^,lns  used  to  cooinlinate  actions  ai.u  can  I"’  to  j'ri  e 

properly.  Currently.  FR.\  includes  seven  suggestive  stnictuies. 

For  exaniiri'’.  to  a^^ociate  th*' <lepeu<lency  identified  in  2.  unh  plan  ac’;.:;.;.-, 

v.f  diM crniins'  v,  hat  action'  a<ide<|  by  /?j,.  and  'viiai  a<  ii.)i.'  i.s’i^^ct  /,  .  a'  oi'pi.ir  .  '; 

1  er'iio  In  t;.i.  cn.-p.  /i’,,  Mie"  of  tlio'o  pm  jof  i  ii  hi  i, j.i  ;  1  _  . 

/  i/p'/  /-.//(//  t  ,  I  ,id  im  nit  I  (I  It  {  \  ^  _  ,,,1  )t  hi  d  IH  **  /  : '  « : .  1 1  o.  o'.  n. t  oi  i ; , 

r,d!.-d 

To  find  ^uue•‘'Ii^■e  'i  i  ik  t  i  ii<-  pl.ui  lilnarx  i'  'oaK  hn  .d!  p'i.in'  son'.:;;. 
roiiibinaiioii  of  tii^’  action'  found  in  tie’  d<-pi'ndency.  F.uli  suci;  I'.-ui  c;  .'^k•■d  Uu  >•.  . 

't  ruct  m  e'  that  in  \  ol  ve  i  h<'  an  ion '  in  i  li<’  d'-poinlinn  y.  In  t  le*  “\a  n  o'  i,  ".i  . 

an  Kill'  ( .  I .  .  I .Old  1  _  I  and  t  lie  iiioiiii  ol  ing  an  n  c;  (  •.  ,  y,'  ■;  ,  ■ 

s'jil  )ii(i;.'-<'n  an.in,  ,\ll  ihli-''  indil'-cl  .Ill.iik  j.-i.i;:' 

npKtu!'":  gtK  1 1 1  ml  III  ih  !  itui  .'■hiiiiilviniithli  ('.<■<>  1  luni'"  ' 
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SharcdVariable 


Fisiure  Mapping  a  dopomloncy  to  two  suggc><;tivc  >tiuctii!cs 

SiijUditidl  orrit  riiuj  is  nil  f-iiablinu,  coiuiition:  if  one  action  is  pnarantcoii  to  roiin'  l•cfo^c 
another,  thou  tin?  firs:  action  iias  tiio  [uitciKial  to  intinonco  tiio  •■'iiinii  nnidiTti 

loipiii'C’s  that  ova;-'.'  .'■.ction  that  i<‘ioi‘*iu'>‘'  tin'  \ariahio  ag;"‘'s  ahont  In.av  ii  i-.  '<'t  au'i  U''Mi.  Il 


'Oin*'  of  t  ho  a.soinip 
i"'  a  'oiiK''  of  f.iii' 
iiiiiint'’'  ami  anin  i,. 

S  11  1 -I  ;  m 

III  c’lu'f  [7:1  ho\'  <  \ 
t  ho  si  o])s  aiul  -I  a.i 
t  ho  intoiactioiis  !,.■ 

f  M  ill  ilifioroiit 


I'.M.'  a.h.oii;  \ai'ial'io's  us.>  aro  implicit  or  nii<ior-sp'‘ri:i.-(l.  tin/  \a;ial''i.>  miulit 
O',  i'lM  actinii  miciii  .I'siiini’  li.ai  iio-  \ai;ai'i“''  uni:'  ;i!'' 

::ii .li.i  I  ii;ii  - ,  l.'.id 1 1 ,  a  iim  I  11  < :i !i< ■! '  la  :  11  1 1 : m-  • " 1 1 ; ; ; .i  i  . 

; :  a  ;  '  nm.r.  in  pui  |m>-o  i<)  lioniiaio  ( )i  v.i  in..'.  i  a  m  I’.o  i  or  iiJl's' 

■'■..1.0  I  I  )i '  -  <-ii<  <i;ii pass  I  ho  ih.io no'is  oi  I  ho  1,1  li MI  i  iio  mi  .-i  .n't  n  m '  i 

'-■f  I  !;o  plan  am!  t  in-  si  |  .u.-oios  for  lopaii .  sn.oni.,!  i\  i-  si  i  m  ;  n ;  .-s  i.  iom  ii  \  ,  m  o 
■  o.i’i  li.o  siiijiv  ( 'oiis,i(|i|c||i  ]\-_  ^i;o<j(.<i  j\ ..  ^iinrini"-  r;ii:  lio  com :  .1  i  oi 
m-o.iU'  (  f  faiiiiio-  ami  imiic.iio  (ilpojont  lopaim. 
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FRA  is  a  bit  like  computing  a  z-test  in  statistics  with  a  calculator;  the  calculator  comi)utes  the 
z  value  (for  FRA.  a  program  computes  the  dependency  set  and  searches  for  the  structures  in 
the  plan  library)  and  you  look  up  the  significance  level  in  a  table  (for  FR.\.  the  explanations 
and  modifications). 

Continuing  with  the  examitle.  the  two  suggestive  structures  found  for  the  dependency 
[/?5p. underlie  two  different  explanations: 

Implicit  Assumptions  Two  actions  make  different  assumptions  about  the  value  of  a  ])lan 
variable  to  the  extent  that  the  later  action's  requirements  for  successful  execution  are 
violated.  This  can  be  fixed  by  adding  new  variables  to  the  plan  description  to  make  the 
assumptions  explicit  or  changing  the  plans  so  that  the  incompatible  actions  are  not  used 
in  t  he  same  iilans. 

Band-aid  Solutions  recovery  action  may  reirair  the  immediate  failine.  l)ut  failure  may 
be  s>’m])t omat  ic  of  a  deeper  pionk-m.  which  leads  to  sui)se(iuent  failures.  71;:-  caii  be  fixed 
bv  limiiiiic,  liow  the  recoverv  action  i-  ti-etl  or  substituting  a  new  ieco\orr  artioii. 

The  sliarf'd  vanabh'  ran  caii-e  a  faiiu.o-  if  tlie  substituted  piojertion  ralrul.e 
tlie  variiible  liifa’i  "iii  Iv  tlian  v.a-  evjKvosI  l.v  ilic  euv  I'loix'  .Htioii;  the  ]>:  ■ '  1,.,  !  1, umv  lun 
'Itecitiod  veil  ei,ir,l':;!i  to  be  [i;,.;..-!  iv  ;t  ,i  ni  1 1  o;  e,l  u|  \i<)late  muni  I '.>1  i  I:  U  . is. ini:-  .'ien-.l 

accepialtlo  piouie's.  .\b  ei  iia.t  i\ ,  ifie  !e(i,\e|\  attion  roifid  h'.'ui  to  /  tie-  ie<, 
ttetion  is  reit.ui  inu  oidv  .i  -ymptom  oi  .a  deeper  failuie;  the  lire  niev  be  rt-.'-itiu  i.'it  of  rontioi  o; 
the  avail.iltle  ii'sfjuices  lua.r-  ii-a;iv  be  in.uletpittte  firr  the  task. 

The  expl.’inttt  ifui'  a.iiiouni  to  sKetclies  of  w  liat  imglit  !ia\ e  v(j;ie  u  ;  luitr.  I  fie-,  ■..iroi 
delerntiiie  the  (au'e,  bi:;  la'leu  ;-.tteiiipt  to  p!o\ide  eiioun.h  evi,l..||(,>  ,,1  i.-.  ,,,.  ■  . 

.!  (  I  l(  )i  I  s  ()1  t  !  le  j  I,  ;oi ;  M' ;  t  I  .cl  di'C  n|c  lioW  t  o  1:  A  t  !;••  !  eig.  1  ,  .e  >  ■ ..  • 
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about  what  caused  the  failure.  Repeating  the  cycle  tests  the  hypotliesis:  in  tiie  next  cycle,  the 
designer  can  search  for  more  flaws  to  fi.x  and  can  determine  whether  the  modification  achieved 
the  desired  etfect:  the  observed  deiiendency  disappeared  and  tiie  incidence  of  failure  changed 
for  the  better. 

3.2  Evaluating  Failure  Recovery  Analysis 

^^e  e\a!uafed  FR.\  in  two  ways:  by  demonstrating  its  utility  in  finding  and  fixing  a  bug  in 
the  Phoenix  plan  library  and  by  testing  the  effect  of  sample  size  (size  of  the  execuii'Jii  traces) 
on  the  resuits.  The  demonstration  shows  that  FR.V  is  useful,  and  the  modification'-  iia-ed  oti  it 
are  effective  at  reducing  the  incidence  of  a  jtarticular  failure.  Sample  size  is  iinpoi  taui  because 
dependency  deU'ction  is  a  statistical  techni<)ue  and  because  tlte'e  traces  can  be  evppuvivp  to 
collect. 

3.2.1  Coiuirletliig  a  Cycle  of  FR.\ 

Section  e.l  (l<-'Cribe(l  rR.\  in  the  context  of  an  exainiue:  ext'-iainiiig  wliy  I’aiiuie  /  ;e;;d.v 

to  follov.  ie,o\e|\  aiuioii  P ,  .  1 0(1  i fi f a lioii '  to  tin*  Plic'enix  jii.auner  ha\i'  been  .-i:..,; 


hnised  (  Hi  1  h 

.and  lesie.l  !)\  siariiii"  .anoiK.'i  cvcle 

of  r 

Iv  .\  :  '  1 

1-  s 

'hov-  that  t 

ii*’  iiH  t(li ju'.i  1 1  -  Ml'  I • 

i|o'.e  the  idaMliei's  pel  fo;  niaiici'. 

1  he  1 1  io(i  ilun  1  loii  w  a  -  oi'.e  ( -  :  :  v.  o  -u'jee--i  ed  liv  FRA.  I  he  I:;  --i .  v.  iurh  hxe-.  .1  hi! li  -n.ii 

I  (Mpiii  (o  hiiiiiuig  the  ap]iiic.ii  :•  r'.i  (jf  i  lie  -u-.pc<t  1  cfoverv  .ict  ion.  In  this  rase,  the  r,-( .  \  ,  ■ 

ha<l  be(.'n  added  t  o  iiiijii  o\  e  :  i.r  (,verv  pet  foi  inaiu  e  in  iv-o  expense,.'  failures,  j  eiu, 1 g  ", 
■-el  liei  !ol  malice  back  to  ])0'.  aie.s  ievejs.  d  he  ()ther  lUodlfic.it  I- in  Intendeil  to  "I 

(/'-'-(Oe/n/oj/s.  1  h  is  U10(i!!e  .;  I  ,•  ii.is  I  V.  o  pai  t  -  :  fi;  sj  ,  f  heck  !,  iv.  :  l.e  ugS,,,  1  ,  I  ,  ,  . ,  • 

' '■*  and  t . ,e  1 1 n  1; i : ;  o;  11 1 ■’  ■;  es  i  h.e  v.'. i  :a bh-  an t ac ■  j  Tc  '  ect  1  r  r. .  .1  in i  ■  - ■  - a  .n e  1,  . 
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can  use  tlie  assuinpiious. 

Tlie  second  inodification  was  implemented:  checking  the  code  for  the  jtiojectioii  actions  and 
the  monitoring  action  showed  that  tl\e  values  set  hv  the  tliree  projection  actions  varied  widely. 
The  monitoring  action  uses  summaries  of  the  resources'  capabilities,  set  by  tlie  projection 
actions,  to  construct  e.xpectations  of  progress  for  the  plan.  The  three  projection  actions  differed 
in  what  capabilities  were  included  in  those  summaries  (e.g..  rate  of  building  fireline.  rate  of  travel 
to  the  fire,  startup  times  for  new  instructions,  and  refueling  overhead).  Because  the  monitoring 
action  assumed  that  the  summaries  reflected  only  the  rate  of  Irnihiing  fireline,  the  conditions 
for  signaling  failures  effectively  varied  among  the  different  projection  actions.  To  accommodate 
these  differences,  the  projection  actions  were  re.stnictured  to  set  separate  variables  for  eacli  of 
the  capabilities:  the  monitoring  action  then  combines  the  sejrarate  varialdes  to  define  e.x’pecteii 
progress.  In  addition,  a  few  minor  hugs  were  fixed  in  the  calculation  of  the  Miniinaries. 

Tl;e  tnodified  jilanner  was  tested  in  ST  trials  of  the  same  experinmni  'em;)  U'evl  ior 
earlier  tliree  e;<peiiiuents  and  analyzed  for  dependencies.  If  tlie  reconunended  nui'tifira; r 'u  v,-:..' 
a|)])ropriat e,  the  ob'eiM'd  I/i’,  depen<li>ncy  'honld  liave  dt'aiipea red .  a'  w'l:  dther  di.-- 

))<Miileiirie'  i  i  iiv  pi  1 1  lee!  ion  act  ioii'  a  n<l  I  he  nionii  oi  inv  act  a  •n .  In  f.ici .  all  Ion ;  ■  >)  •  ii  ■■  i  . 

llelK  lev  in\  1  I'j'.  illV  pioi'S  !  lol,  „(  llollv  and  t  fie  e||  \  el(  I|  1  |.  ,;i  ' .  I  ‘  e,  .  v  i  \  . .  /;,  .  / 
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'.’xec  III  ion  lime.,  .\iiiii:  ioii.div.  l;\iau  t  lie  calcnlai  ion  br,'.r'  .im;  :  ■•-i :  mi  n  1 1  nu  ii,.'  .•a'lmi'  asj  lo 
a  Inwei'  iiicnience  of  a  eeneial  f.nluie  to  calculate  projeriKn,'  i  /h,  j;  ,  .nronnieii  nn  jii.s  ,.' 
of  the  failnies  in  tin’  pieviuu'  e.xjieriinent  and  <)nl>'  ni  :..i'  e\ poi  iineni .  p,\  o.ji,,;:  ,ir.r  a 

ll\))Ot  he'i/e<i  can-e  of  f.iihlie.  one  \Mmld  als<r  expert  I  fie  o'.e; 1 I  e  ,  ,f  1,,  lb;  1  e-  1  ( I  .iei  h:;e,  I  i.e 
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interesting  for  two  reasons:  First,  tliere  is  a  long  time  delay  beiwoeii  tlie  execution  of  tlie 
recovery  action  and  the  failure  detection,  which  means  that  we  had  not  thought  to  check  the 
much  earlier  event  as  a  source  of  failures  or  hugs.  Second,  the  i)roiection  actions  (the  aciions 
added  by  R^p)  involve  ex'tieinely  com])lex  code,  wliich  is  difficult  to  lieliug.  Consequently,  the 
information  provided  by  FRA  was  useful  in  tracking  down  bugs  in  the  projection  code. 

3.2.2  Sensitivity  of  Dependency  Detection  to  the  Size  of  E.xecution  Traces 

E.xecution  traces  are  often  expensive  to  collect.  Conse<piently.  iiiurh  of  the  effort  reipiired 
to  execute  dependency  detection  is  exiiended  collecting  execution  traces.  We  expect  that  the 
results  of  dependency  detection  ttill  vary  ba.sed  on  how  many  e.xecution  traces  we  collect:  the 
total  number  of  patterns  (ie..  possible  combinations  of  different  tyjie-  of  jirecursors  and  failures ) 
and  the  ratios  of  the  patterns  (i.e..  the  ratio  of  the  counts  in  the  first  coluinn  to  the  counts  in 
the  second  column)  in  a  conrinsency  talile  influences  the  results  of  t'le  tptrst.  To  (iei.u  inin-' 
how  the  size  and  number  of  exooutiou  traces  collected  influences  the  :e.;;!ts  of  the 
need  to  an>\\ei  t\so  (lueviioii':  How  does  the  value  of  G  ch.'n.ue  i.he  iiuiiibei-  nf  p,i:!e;!|.  i,, 
the  evcniiioii  ii.'ice^  ijiC! e,',~e.''  How  doe^  the  value  of  (,I  change  :!ie  I'lerui'.tu  in  ii. 

I  K  (  II  n  eiH  e  (  1  I',,  t  lie  I  at  III  of  1  l.e  Iippei  1  ivht  to  uppef  left  cel'.  ;|,  |l . '  i )  1 V  e) ,  i  \  I.,;,';,  i 

fl  I  nil  I  lie  1 1'.,  I  ol  the  I  \ei  ill  i.  Ill  I  I  .u'e'  I  i.e.,  the  rat  io  <  >f  the  !uV  et  !,  e-  i ,  ,  ;  •  ji  i\\  e]  n  V  i ,  I  i  ei ;  -  i 

the  Cllllt  iliveiic\  lablel.’  1  he  j;;'!  llUevlioll  adllre'-C-,  the  | \  i  •  i,f  ij...  Ii-st  Io  I'l.e  -i.’i  ,ii 
t  he  execill  ion  traces;  the  second  addies^es  t  lie  v^l^l..|^ivity  to  lioi-.e;  !|(,\\  imicil  Ilf  a  liiiiiieiue 
ie(|uirc(l  to  detect  a  dependeiicy  ’ 

G-Tcst  Sensitivity  to  Execution  Tr.ace  Size  \Ve  -elerieil  li..-  i,  o  ...ei  il;,.  ; 

moil  ( ’lii-Miuaie  I  I'M  I  i,..  (  i-i..  i-  aiidi;  i  ve.  ,\ddil  i\  iiy  iiie.,;,-  (  i  -  :  o;  ■  -  .n  - 
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same  hut  the  total  mmiber  of  counts  in  the  contingency  taltlo  clouitle.  then  the  C  value  for  the 
contingency  table  doultles  as  well.  Conse<iuently.  given  execution  traces  with  few  patterns,  the 
G-test  can  find  .strong  dependencies.  Irut  given  more  jratterns.  it  will  also  find  rare  dependencies. 
If  a  user  of  FR.A.  is  interested  in  detecting  any  dependencies,  then  a  few  execution  traces  will 
he  adequate  to  do  so;  if  the  user  wishes  to  fiiid  rare  or  obscure  dej)endencies.  then  it  will  be 
necessary  to  gather  more  execution  traces.  The  level  of  effort  expended  in  gathering  execution 
traces  depends  on  what  kinds  of  dependencies  one  wishes  to  find. 

G-Test  Sensitivity  to  Noise  We  know  that  the  value  of  G  increases  linearly  with  iticreases 
in  the  r.umber  of  patterns  in  the  execution  traces,  but  only  if  the  ratios  in  the  contingency  i.able 
remain  the  same,  as  the  number  of  patterns  increases.  In  trying  to  decide  how  many  execution 
traces  to  gather,  we  aho  nef'd  to  know  whether  the  re>ults  will  be  vulnerable  to  noi'O.  which. 
!•>  mor-^  apttarent  with  fi'‘'v  ])aiterns.  We  can  evaluate  empirically  whether,  in  praciic<w  u'Utinu 
'lightlv  mor<‘  or  few<'r  ex^'cntion  traces  would  ha\e  -iunificant ly  ch.-uicmd  which  <i<'p‘'ini''iicii" 
wei<^  eci f'(i  in  <‘VfTin nui  ir.'icf'  gathered  from  Piioonix. 

\N'.'  ii  \  i(,  i.oi'O  by  "t  wo.ikiiiu"  I  li‘>  fi ''(p|i‘ur\'  <nniii''  found  in  ;  n*'  d.oa  'o 

(j1  iIm'  doji. II'  i<’~  would  liol  i,.:M  i . .  .i.'t|.(!,.d  l!  ;i;o  lOUllI-  !li  low  1.;:..  | , 

I  Im '  (  .  i;,;  I '  1 .1  bi'’  n  d  '  .1  '  jiial!  ;i iiioii  ui  1  .  .|  n  m  1  <  i{  e'xi.i  n  j ..  ,1,  1  1  .k  1  •  i  . ,  .m; 

'ho  1  . . . .  IHi'Uil  -  phi-  •  ,  '•-!  iioiii  lie-  lou  J  I  ),  ox  JI.U  iim-Iil  do'i  ]  ||.  |  y,.  • 

! ''•  *’,1 1,  ■ ;  u  t!i>>  \aiui'''  ii’-'uIwn;  ;;i  ,1  lu.-'’-  of  .'iboiil  .{”1',,'^  of  I  lie  (i<'p<’ii(l>'iici'"'  iouiid  [>i<wii>U'0.  J:i 
oi  !i>’r  word,-,  for  the  eata  from  Phoenix.  (h'pi'iid<'MC\  doiection  i'  ^i‘n''i'i\<'  lo  '■in.dl  a, 
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The  iiiiplicalion  of  the  sensitivity  of  dependency  detection  to  noise  in  the  execuiion  traces 
is  that  rare  patterns  are  especially  sensitive  to  noise  and  so  should  he  viewed  skeittically.  One 
must  interpret  the  results  of  dependency  detection  with  care:  if  "sensitive”  dependencies  are 
discarded,  then  rare  events  may  remain  undetected;  at  the  same  time,  one  does  not  uish  to 
chase  chimeras.  Interpreting  dependencies  retpiires  weighing  false  positives  against  mis-es.  If  we 
are  trying  to  identify  dependencies  between  precursors  that  occur  rarely  or  failures  tliat  occur 
rarely,  tlieii  additional  effort  should  Ite  exi)ended  to  get  enough  e.xecution  traces  to  eii'iir'^  that 
the  dejrendency  is  not  due  to  noise. 


3.3  Summary 


The  liurpose  of  Failure  Recovery  .\nalysis  is  to  identify  cases  in  which  i>laus  ma\'  i 
exaceri)a;e  or  cause  failure.  There  are  two  reasons  why  analyzing  failure  reco\su\'  : 
'\a\'  ro  •.'-reriniiie  wliy  the  planner  fails.  First,  failure  recovery  inflnenc'^s  v.hirh  i'aiin: 
Minor  cn.-'.nees  in  the  design  of  latiure  recovery  ])roduce  significant  cltan.u'*'  iti  i,!;; 

t;>pes  of  failures.  Secoii<l.  failuie  recovtuv  U'es  plans  in  wavs  not  expilcitix  M:* . 

''.‘'-'iut.e; but  not  frubnidcn  or  prevout ed  b\'  tlioni  t'itlier.  railuir  ref(,'..-i\  si  ;).-.;: 
■•'idir.g  <■;  i epj.K  ing  |)o;t|(in-  of  tj.aui.  A'  a  i''^iilt,  ilic  phaii  ni.o  iin  inde  i  i.^n  ;..  lo  ■ 
a.  1  e  V-. jvi.d  ill  oilier-  and  'ont.-vi  uot  <Mi\i'ioned  1)\  t  he  . 
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i  ;.;')-.t  i  nt  el  e- ;  j  ji  e  featnie  l-'.  i  IvA  I-  t  lie  integration  of  kleiV,  n  liMe  , 
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results. 

4  Future  Work 

The  FR.\  procedure  promises  to  improve  planner  reliability  and  to  expedite  the  development 
of  plan  knowledge  bases  for  new  environments  by  assisting  designers  in  debugging  knowledge 
Irases.  However,  at  its  irresent  stage  of  development.  FRA  is  limited  in  several  ways:  the 
procedure  is  only  jrartially  automated  and  is  implemented  as  a  loosely  organized  set  of  Lisp 
functions,  e.xecution  traces  contain  only  failures  and  recovery  actions,  dependencies  include 
only  temporally  adjacent  precursors  and  failures,  and  the  procedure  has  been  tested  only  on 
the  Phoeni.x  planner. 

Future  research  will  address  these  limitations  by  “closing  the  loop"  of  gathering  and  an¬ 
alyzing  execution  data  and  by  generalizing  FR.\  to  a  broader  range  of  bugs  and  to  another 
planner.  Closing  the  loo])  refers  integrating  all  of  the  tools  necessary  to  support  complete  test¬ 
ing.  analysis  and  rejiaii  of  a  planner  during  its  development  process.  The  designer  will  still 
(iiif'c:  the  pioce^'.  Imi  .1!  <lo  ~r,  by  >'’l<*riing  from  sets  of  lue-defiiied  exjteriment  scenario'  and 
for  |)'M  foi  niiii'.:  i)  \  an.ils '■i-  «>f  i  In* ‘’xerut  ion  data.  Cenpializin'.;  FHA  if'f'ns  to 

'■xpa  ndint;  i  lio  '<>i  <  il  1  ni'.;-  t  ii  .i  i  <  .i  n  lx-  ni'-m  lin'd  aii<l  .ip])l\  inn  tin'  pi  orcdu  i  >■  to  an<n  hci  p|a  n  not , 
1  h<'  --el  of  siixMc.i  i\  .■  -!  I  Ml  nil  .Old  <'\|dniiai  ion..  n<'<Ml..  to  he  I'nli.anrnd.  “'p.-TialU  when  F  K  .\ 
i>  aitplied  to  anothei  pl.mnei  and  eii\  jionment .  The  r.intre  and  nature  of  (;el)end.M!cie^  need> 
to  be  further  exploied  a'  well,  lor  examitle.  if  tlependencie^  leflect  the  interaction  between  the 
planner  and  its  enxironnient .  will  'linilar  enviionnientN  and  planner>  lesidt  in  -iniiiai  IlaI^eln^of 
dependencies De]ieiideiiries  may  ])io\e  to  lie  a  itiet lie  of  sintil.'i  1  it \  betwixMi  pl.innei.  . 
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5  Conclusion 

Certain  software  systems,  so  called  ambitious  st/staiis  ('•].  are  jnone  to  failure.  These  include 
systems  being  developed  for  novel  or  unfamiliar  tasks,  systems  in  unpredictable  environnients. 
or  systems  with  organizational  comple.xity.  Failure  is  a  consetiueiice  of  complexity  in  the  en¬ 
vironment  or  the  software  and  the  fact  that  our  facility  in  constructing  complex  sysienis  lias 
surpass’d  our  ability  to  understand  their  behavior.  Couserpiently.  the  software  most  iikeiv  to 
fail  is  that  which  is  also  hardest  to  under.stand  and  to  debug. 

The  goal  of  the  described  research  is  to  reduce  the  impact  and  likeliliood  of  faiime-  that 
result  from  a  lack  of  understanding  about  how  an  AI  ])lanner  will  perform.  Failn;-?  recov¬ 
ery  provides  a  safety  net  for  catching  failures  that  cannot  be  avoided  easily;  the  iiiciemeittal 
iiK'tliccIology  ?  bnildkig  failure  recovery  suited  to  a  particular  planner  and  :*<  .'nvi- 

roniuent.  Failure  Recovery  Analysis  helps  programmers  to  ()<>)>!)£;  planner-  under 
because  it  re<iuires  only  a  we.-ik  model  of  how  they  irerform  ami  relie-  on  s'atiMical 
the  execution  of  the  plannei.  Together,  these  ai)])roarhes  h.axe  h*'im  deiuoii't  r.ii  I'd  n  nnov''.-' 
the  I eliahilit y  and  the  jj'']!' •:  in.oice  of  the  Phoenix  phanm-i  .  <d\"n  i;o\>,  ;.  \\ 
mad*'  alxiiii  the  planmn  a  ini  -  “nv  ironuient .  )'  Iv  A  pi  omi'-''  i  o  ph-  , :  •  ■  -  . 
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An  Empirical  Method  for  Constructing  Slack  time  Envelopes 


Early  Warnings  of  Plan  Failure,  False  Positives  and  Envelopes: 

Experiments  and  a  Model 

Paul  R.  Cohen,  Robert  St.  Amant,  David  M.  Hart 


Abstract 

We  analyze  a  tradeoff  between  early  warnings  of  plan 
failures  and  false  positives.  In  general,  a  deasion  rule 
that  provides  earher  warnings  will  also  produce  more 
false  positives.  Slack  time  envelopes  are  decision  rules 
that  warn  of  plan  failures  in  our  Phoenix  system.  Until 
now,  they  have  been  construaed  accordmg  to  ad  hoc  cn- 
tena.  In  this  paper  we  show  that  good  performance 
under  different  cntena  can  be  achieved  by  slack  lime 
envelopes  throughout  the  course  of  a  plan,  even  though 
envelopes  are  very  simple  decision  rules.  We  also 
develop  a  probabilisuc  model  of  plan  progress,  from 
which  we  dense  an  algonthm  for  constructing  slack 
ume  envelopes  that  achieve  desired  tradeoffs  between 
early  warnings  and  false  positives. 

1  Introduction 

Underlying  the  judgment  that  a  plan  will  not  succeed  is 
a  fundamental  tradeoff  between  the  cost  of  .in  incorrect 
decision  and  the  cost  of  evidence  that  might  improve  the 
decision.  For  concreteness,  let's  say  a  plan  succeeds  if  a 
vehicle  arrives  at  its  desunauon  by  a  deadline,  and  faik 
otherwise.  At  any  point  m  a  plan  we  can  correctly  or 
incorrectly  predict  that  the  plan  will  succeed  or  fail.  If 
we  predict  early  in  the  plan  that  it  will  fail,  and  it  even¬ 
tually  fails,  then  we  have  a  hit,  but  if  the  plan  eventu¬ 
ally  succeeds  we  have  a  false  positive.  False  positives 
might  be  expensive  if  they  lead  to  replanninc.  In  gen¬ 
eral,  the  false  positive  rate  decreases  over  time  (e.c., 
very  few  predictions  made  immediately  before  the  dead¬ 
line  will  be  false  positives)  but  the  reduction  in  hdsc 
positives  must  be  balanced  against  tlie  cost  of  waiting 
to  delect  failures.  Ideally,  we  want  to  accurately  predict 
failures  .as  early  as  possible;  in  practice,  we  can  have 
accuiacy  or  early  waniings  hut  not  both. 

Ttie  false  posiiive  rate  for  a  decision  rule  that 
at  lime  /  predm.s  lailure  will  generally  deciea.se  as  t 
uicre.x'.es.  e  an.dy  /e  this  ii.i.leolf  in  several  w;i'.  ‘,. 
iarsl,  we  destr;--.-  a  '.ery  nnple  de^i.sion  iiile.  v.illed  ,i 
:  ..I,  i:  utr.c  ‘-tr.i'j,  pr.  we  have  in.eil  tot  \e.irs  in  the 
riioeni.x  pla.nner  Onwlmns  2  and  '1  hen,  using 
einpiiical  dala  Irom  l'h(emx,  we  evahi.ue  the  false 
posiiivc  rale  for  en\elo;ve.s  .jntl  show  ihal  envc)o|H.'s  can 
inamtain  goi.d  perl(>rinanee  througlunil  .i  pi. in  (Seciioii 


4).  An  infuute  number  of  slack  time  envelopes  can  be 
constructed  for  any  plan,  and  the  analysts  in  Section  4 
depends  on  "good"  envelopes  constructed  by  hand.  To 
be  generally  useful,  envelopes  should  be  constructed 
automatically.  This  requires  a  formal  model  of  the 
tradeoff  between  when  a  failure  is  prediaed  (earlier  is 
better)  and  the  false  posiave  rate  of  the  prediction 
(Section  5).  Finally  we  show  how  the  condiuonal 
probability  of  a  plan  failure  given  the  state  of  the  plan 
can  be  u.sed  to  construct  "warning"  envelopes. 

2  Slack  Time  Envelopes 

Imagine  a  plan  that  requires  a  vehicle  to  drive  10  km  in 
10  minutes.  Figure  1  shows  progress  for  three  possible 
paths  that  the  vehicle  might  follow,  labeled  A.  B  and  C. 
Case  A  is  successful:  the  vehicle  makes  rapid  progress 
until  time  3.  then  slows  down  from  time  3  to  time  4. 
then  makes  rapid  progress  until  time  8.  when  it  com¬ 
pletes  the  plan.  Case  B  is  unsuccessful;  progress  is 
slow  until  ume  4,  and  slower  after  that;  and  the  required 
distance  is  not  covered  by  die  deadline. 

The  solid,  heavy  line  is  a  slack  time  envelope 
for  this  problem.  Our  Phoemx  planner  (Cohen  et  aJ.. 
1989;  Hart,  Anderson,  &  Cohen,  1990)  constructs  such 
an  envelope  for  every  plan  and  checks  at  each  time 
interval  to  see  w  hether  the  progress  of  a  pl;m  is  w  ithin 
the  envelope.  Case  A  remains  within  the  envelope  un- 
ul  completion;  case  B  violates  die  envelope  at  time  o, 

1 
9 
8 
7 

distance  g 
lemaining  j. 

4 

3 

o 
1 
0 


•luuiP  1-  lilt’ll  1’!  (’lu  {•  i  'l' •  V 


slack  time 


0  1  3  3  r. 


F49620-89-C-0113,  Final  Technical  Report 


Paul  R.  Cohen 


When  an  envelope  violation  occurs,  the 
Phoenix  planner  modifies  or  completely  replaces  its 
plan.  It  should  not  wait  until  the  deadhne  has  expired 
to  begin,  but  should  start  replanning  as  soon  as  it  is 
reasonably  sure  that  the  plan  will  fail.  Clearly, 
envelopes  can  provide  early  warning  of  plan  failure;  for 
example,  in  case  B,  the  envelope  warned  at  time  6  that 
the  plan  would  fail.  The  problem  is  that  progress 
might  pick  up  after  an  envelope  violation,  as  shown  in 
case  C.  At  time  5  the  envelope  is  violated,  but  by  time 
8,  the  plan  is  back  within  the  envelope.  If  in  this  case 
the  Phoenix  planner  abandoned  its  plan  at  time  5,  it 
would  have  incurred  needless  replanning  costs.  Case  C 
is  a  false  positive  as  we  defined  it  earlier;  a  plan 
predicted  to  fail  that  actually  will  succeed.  Note  that  a 
different  envelope,  showm  by  the  heavy  dotted  line,  will 
avoid  this  problem.  Unfonunately,  it  doesn't  detect  the 
true  failure  of  case  B  until  time  8,  two  minutes  after  the 
previous  envelope.  This  illustrates  the  tradeoff  between 
early  warnings  and  false  positives.  (This  and  other 
concepts  in  the  paper  denve  from  signal  detection 
theory,  e.g.,  (McNicol,  1972;  Coombs,  Dawes,  & 
Tversky,  1970).) 

Slack  time  envelopes  get  their  name  from  the 
period  of  no  progress  that  they  permit  at  the  beginning 
ot  a  i'l  in.  “nie  Phoenix  planner  adds  slack  time  to 
envelopes  so  that  plans  will  have  an  opportunity  to 
progress  before  they  are  abandoned  for  lack  of  progress. 
Until  recently,  this  was  all  the  justification  for 
envelopes  we  could  offer.  In  the  following  sections, 
however,  we  show  why  the  simple  linear  form  of 
envelopes  achieves  high  performance,  and  how  to  select 
a  value  of  slack  time. 

3  The  Data  Set 

One  way  to  evaluate  slack  time  envelopes  is  to  generate 
hundreds  of  plans,  monitor  their  execution  at  regular 
intervals,  and,  at  each  interv.al.  use  an  envelope  to  pre¬ 
dict  success  or  failure.  We  generated  1 139  travel  plans, 
or  paths,  for  vehicles  m  our  Phoenix  simulation. 
Phoentx  is  based  on  a  machine-ieadablc  map  of  Yellow¬ 
stone  N.iiional  Park  that  includes  roads,  obstacles,  a 
variety  of  elevanons  and  ground  co\ers.  ;uid  other  terrain 
features.  The  Phoenix  planner  fights  simulated  forest 
fires  in  this  environment  by  surrounding  the  fires  with 
fireline  built  by  bulldorers.  Envelo[>es  in  Phocmx 
monitor  tire  spread  rate,  fireiinc  digging,  and  proereiss 
in  diflereni  bulldozer  i.asks.  The  focus  of  this  paper, 
however,  i.s  a  .smipler  problem:  getluie  fnaii  one  point 
on  the  map  to  another  by  a  de.i.lhne.  To  generate  our 
d.ita  et,  ue  rer-e.iiedlv  sel.vleil  p.nrs  of  jioinis  70  km 
ap.irl  .L%  it'.e  ero.w  llie  ..  aid  ..  .fed  liie  I'lioeiiix  pl.inn.-r  to 
con  .t.mct  .1  r.ilh  between  e.nh  Then  v.e  simulaled 

Itie  K.i\i.r',d  111  e:icli  p.uh.  monilonti'.'  it  e\eiv  lOOi) 
siniu'..i'eit  ^e^(|;ld^.  ..'>.1  e:wli  infiniloiiin'.  eieji  we 

e'.lim.iied  ;.be  diet. nice  lemaiiiine.  (o  the  destiii.itu.n. 
Pet ai'i'ee  ol  or-  ,t.iole\.  U'rr.iiii,  .iml  -.o  on.  ih  ■  di  li ibulK.n 
ol  le.M.iin;;, e  (IrM.inces  al  a  er.a-n  niomionn:'  in'er-.  ,i) 


Figure  2.  How  we  generated  dislnbuiions  of  DR  for  suc¬ 
cesses  and  failures  at  each  time  interval. 


was  considerable  (including  many  greater  than  70  km). 
For  example,  after  5000  seconds,  the  mean  remaining 
distance  was  about  54  km  with  a  range  of  13.6  to  79.1 
km.  We  generated,  executed  and  momtored  1 139  paths 
in  this  manner. 

3.1  Distributions  of  Eventual  Successes  and 
Failures  Before  the  Deadline 

We  chose  a  deadline  of  15,000  seconds  to  divide  the 
paths  into  two  groups:  paths  that  reached  theu-  goals  by 
the  deadline  were  called  successes.  ;uid  those  that  did  not 
were  aiUed  failures.  Of  1139  paths.  654  succeeded  and 
485  failed.  We  looked  at  each  path  15  times,  once 
eveiy  1000  seconds,  and  recorded  an  estimate  of  the 
number  of  "distance  umts"  remaining  to  the  goal.  For  a 
variety  of  reasons,  a  distance  unit  is  2  km.  so  tlie  dis¬ 
tance  remaining  to  the  goal,  abbreviated  DR,  is  35  at 
the  beginning  of  the  pkui  a.od  zero  for  successful  paths 
at  the  end  of  the  plan.  Hencefonh.  we  use  time  x  '  as 
shonhand  for  "x  thousand  seconds  elapsed."  For  exam¬ 
ple,  in  Figure  2.  at  lime  4,  all  the  paths  with  DR  =  IS 
are  failures;  at  time  5,  all  the  pallis  with  2b  <  DR  <  28 
arc  failures;  but  at  time  5.  DR  =  25,  three  paths  are  suc¬ 
cesses  ;uk1  two  arc  hulures. 

We  plotted  frequency  polvgi.r.s  for  DR  for  suc¬ 
cesses  and  failures  at  each  of  the  15  time  iii:er\als. 
Figure  3  shows  the  distribution  of  .successes  and  l.iilurcs 
at  time  5.  Note  tliat  most  faibiies  still  have  a  lone  w  ay 
to  travel  .It  time  5:  the  bulk  of  the  disinl'utioii  lies  to 
the  right  of  DR  =  30  (the  mean  1)K  U'r  f.ulures  :i!  time 
3  IS  33).  The  distribution  of  successes,  ho.co-.  er.  is 
made  up  c'f  paths  w  iili  rcl.iiic  eP  '.hi'ii  icm.ii:.;-- ,•  dis¬ 
tances  tu  the  goal  (me. in  DK  =  1  .  i. 

,^.2  I'lnpifiral  Hit  K.ilis  and  1  .iIm-  !'■  ^in\c 
R.iles  for  DK  I  liri  slodds 

l  et's  piedut  lli.it  a  p.ilh  will  l.nl  lo  i,'.i,  !i  ii  ,  i-  il  1  \  , 

de.ullille  if.  ,it  liiiu-  0,  llic  i;-ni,i;iii;r’  d.i  I  ui.c  i  -  ,•  ■  d 

is  30  or  moie,  iii.it  is.  the  //':/ 1  li.T  _•  ■  ’  Idie 
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Figure  3.  Frequency  polygons  for  DR  at  lime  5. 


errors,  paths  we  predict  will  fail  but  that  eventually  suc¬ 
ceed.  Of  the  654  paths  that  eventually  succeed,  37  lie 
in  the  dark  shaded  area;  the  probabdity  of  a  false  posi¬ 
tive  is  therefore  37/654  =  0.056.  TTie  light  shaded  area 
represents  hits,  paths  that  are  predicted  to  fail  and  that 
actually  do  fail.  The  probability  of  a  hit  is  261/485  = 
0.538.  The  ratio  of  the  probabilities  is  9.60. 

At  time  5,  the  threshold  DR  >  30  seems  pretty 
good  because  the  ratio  of  hit  probability  to  false  posi¬ 
tive  probability  is  high,  but  we  cannot  say  it  is  the  best 
threshold  unless  we  know  the  relative  values  of  hits  and 
false  positives. 

This  is  just  part  of  the  analysis  of  our  data  set. 
In  particular,  we  haven't  shown  our  analyses  of  success 
and  failure  distributions  at  other  times;  nor  hit  and  false 
positive  probabilities  for  different  DR  thresholds  at 
different  times;  nor  success  and  failure  distributions  for 
stricter  or  more  lenient  deadlines.  We  can  summanze 
these  analyses  as  follows;  At  later  time  intervals,  the 
success  distribution  is  increasingly  right  skewed,  with 
most  of  its  mass  around  low  values  of  DR.  At 
intermediate  time  intervals  (e.g.,  time  =  7)  the  failure 
distribution  is  roughly  unifonn.  Later,  it  is  nght 
skewed  like  the  success  distnbution,  but  with  more 
ma.ss  in  its  tail  than  the  success  distnbution.  Shifting 
the  DR  threshold  to  the  right  decreases  both  hits  and 
false  positives,  though  false  positives  dccrea.se  fister  (as 
in  Fig.  5,  only  more  so  at  later  time  intervals).  These 
patterns  hold  for  stricter  and  more  lenient  deadlines;  the 
main  effect  of  a  stneter  deadline  is  to  reduce  the  number 
of  successes.  Tlie  following  evaluations  of  envelopes 
are  based  on  die  15,()(K)  second  deadliiK*  illustr.iied  above 
because  it  produces  a  nearly  even  spin  between 
successes  (654,  total)  and  I'.ulures  (4H5,  tot;d). 

4,  Evaliintion  of  Slack  Time  Fiivelopes 

Slack  time  envelopes  are  decision  niles  for  )itedicnni’ 
whether  paths  will  succeed  or  fail.  As  illustrated  in  lie- 
iire  1,  if  the  path  is  within  the  hound.iry  of  an  envelope 


at  a  particular  time,  then  we  predia  success,  otherwise 
we  predict  failure.  Each  point  on  an  envelope  bourxlary 
specifies  a  DR  threshold  for  a  particular  time,  and  so 
has  an  associated  hit  rate  and  false  posiave  rate.  In  this 
section  we  use  slack  time  envelopes  to  predia  whether 
paths  in  our  data  set,  discussed  above,  will  succeed  or 
fail.  We  evaluate  the  predictive  pericrmance  of  slack 
time  envelopes  according  to  this  cnunca:  An  envelope 
should  provide  performance  approaching  optimal 
throughout  a  plaa 

This  depends  of  course  ca  our  definition  of 
optimal.  Consider  a  decision  rule  based  on  distance 
remaining  (dr)  to  the  goal  at  time  t: 

Ifl(dr,  t)  >  pit),  then presx: p'jjn  failure 


Pr( dr|  plan  fails. :  > 

where  l{dr,t)  =  ■ 

Pr  ( Jr  I  plan  succeeds .  r ) 

Intuitively,  we  have  an  observation  of  dr  at 
time  t,  and  we  must  decide  whether  cjs  ebservanon  has 
been  produced  by  an  eventual  success  or  failure.  '-Ve 
base  our  decision  on  whether  the  IhceLchood  is  greater 
than  the  threshold  Pit).  A  basic  result  uom  signal 
detection  theory  is  that  the  uulity  of  this  decision  is 
maximized  if 


Pr( plan  succeeds,!) 

pit)  =  - (7’-'.—" I :)) 

Pr(  plan  fails,  t) 


where 
Payoff  ( t ) 


Valicorrea  rej,l)->-C.  false  pos.t) 
\'aHhii,i)+Cc::\  y~::ss,!) 


In  lire  simplest  case,  r  i  cs  constant  over  ’he 
course  of  a  plan.  A  more  realisuc  ac'essmem  requires 
analysis  of  the  terms  in  Pit).  The  term,  the  pnor 
probabilities,  decreases  with  lime,  is  plans  begin  to 
succeed.  Tlie  second  term  Payoffr..  oe’.erTnines  the  rela¬ 
tive  importance  of  hits,  false  posru  -es.  correct  rejec¬ 
tions,  and  misses.  The  value  of  ctrrec’Jy  predicting  a 
plan  failure  decreases  over  time;  eiriy  warnings  are 
worth  more.  At  the  same  time  be  cost  of  a  tahe 
positive  increases  over  lime;  i:  .'e  are  going  to 
unnccessanly  abandon  a  plan,  ii  ;s  "--vtcr  lo  do  -ja.-iy 
in  the  plan  than  later  when  we  ba-.‘  vied  a  1  !  .  ; 
time  in  the  pl.iti.  It  is  more  diii'n  ..  '  '  -  -.■  ■ess  •. a.oe 
of  a  correct  reiectiim  and  the  c:  :  ;  a  1  at  :!  ^e 

a.\sume  they  are  con,si;iiu  lelalr-e  '  -'.e  ■.  Ui.t  p.ir. 

ters,  then  the  v.ilue  of  the  see' ■  ;  '.'to  m  -i 

increa.scs  over  nine.  We  eonsi.:,:  ..i  es  ni  •. 

Paxojjit)  is  constant  ;ind  ;ilst.  .a  ■■ 
incie;ises  lineaily  with  time. 
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Figure  4.  Hand-construc;ii  slack  time  envelopes  superim¬ 
posed  on  constant  Pjyc_~  :i  contours. 


Figure  5.  Slack  time  er.v;;,:pes  on  linear  Payoffit)  con¬ 
tours. 

4.1.  Comparina  Slack  Time  Envelopes  with 
Empirical  Utility  Contours 

Because  envelopes  are  .it  straichi  lines,  it  is  unclear 
whether  they  can  satrr.  coumal  penormance  cnie- 
non.  In  particular.  :cr  ccrot.int  or  Linear  payoli  func¬ 
tions,  tlie  DR  tlire.-.hcia  reu  ju'ed  to  maintain  a  con.st.int 
ratio  of  hit  probariLry  to  false  positive  probability 
might  not  change  iinea,';;.  over  time.  To  find  out.  we 
calculated  utility  cor.::  j.’z  from  the  empincal  data  for 
different  Payoffi !j.  as  s.'. :  a n  in  Figures  4  and  5.  A  con¬ 
tour  represents  a  fises  funcuon;  each  point  on 

a  contour  is  the  DR  t-veascld  (y  axisj  that  is  required  at 
a  panicular  ti.me  tr.  arjj  to  easure  that  the  utility  of  the 
decision  is  rna'ti.Titeel  F:."jre  4  we  let  Pj\o;fin  be 
const.ant  at  1  and  :  '.n  .-igure  we  let  Pjyaflt) 
vary  .as  a  tutictif  n  ' :  ; 

•■■•.n  ithpr.r  a.'.'  .  .a.-.i.t-.-risiic  (d  these  contours 
is  that  they  re  .;;:re  '.DR  titreiiiolds  for  the  hrst  lew 
tune  Intel',  a!',,  i  ■.a.-.  s  jallv  sinaller  threslKilds  for 
Liter  time  itiier.  ai..  .  •,  c'.ntnurs  are  roughly  Imettr, 
uliicli  s'lggeM  ,  i;., ;t  a  ■.;;i.-;:me  enselope.  lit  to  one  of 
these  cfUi'oiir',,  t  e .  ;.t  ■ .  :  r  ■.  i..ie  j-eiloiiuaiue  .ipproacli- 


ing  optimal,  given  our  payoff  function.  For  our  data 
set.  at  least.  Figures  4  and  5  tells  us  that  an  envelope 
can  be  constructed  to  satisfy  our  performance  entenon. 
We  will  formalize  this  result  in  the  ne.xt  section. 

5.  Constructing  Slack  Time  Envelopes: 
How  Much  Slack? 

Our  focus  now  turns  to  the  task  of  constructing  slack 
time  envelopes.  We  assume  that  the  end  points  of  the 
envelope  are  the  distance  to  the  goal  and  the  deadline,  so 
the  only  parameter  is  how  much  slack  time  to  allow. 
Next  we  present  a  model  that  predicts  utility  for  differ¬ 
ent  values  of  slack  time.  The  model  also  predicts  the 
early  warning  premium  for  values  of  slack  time.  Early 
warning  premiums  accrue  whea  by  constructing  a  tight 
envelope  with  little  slack  time,  we  detea  failures  earlier 
than  we  would  with  a  looser  envelope.  Empincaliy, 
early  warning  premiums  come  at  the  expense  of  false 
positives.  We  assess  a  cost  for  each  hit  proportional  to 
the  time  interv  al  in  which  it  is  detected;  tins  places  a 
premium  on  early  hits.  W'e  assess  a  constant  cost  for 
each  false  positive.  This  is  desenbed  furJier  in  the  fol¬ 
lowing  sections. 

5.1.  A  Probabilistic  Model  of  Progress 

If  we  know  the  distributions  of  istance  remaining  (DRi 
for  successes  at  each  time  inter.ai  le.g..  those  in  Figure 
3)  then  we  can  predia  the  false  positive  rate  for  a  given 
DR  threshold.  A  simple  mode!  of  the  distribution  of 
DR  begins  with  the  assumption  ’.hai  in  each  time  inter¬ 
val  a  vehicle  can  progress  at  its  ma.\imum  rate  c  with 
probability  p.  or  makes  no  prepress  at  all  with  proba¬ 
bility  q  =  1  -  p.  Then  the  distnbution  of  progress  is 
binomial,  as  show  n  in  Table  1:  ‘he  probability  of  hav¬ 
ing  made  r  units  progress  by  U-me  n  is  just  the  binomial 

probability  (")/> 

For  e.xarnplc,  the  probability  of  one  unit 
progress  by  time  4  is  4pq-  jsc  th-erc  are  four  ways 
to  achieve  this  result,  each  wr..a  prii'rabihty  pq-:  we 
could  m.ike  no  progre.ss  until  time  3  twuh  probability 
q- )  and  then  progress  at  the  maxarnum  rate  for  one  tune 
unit  dotal  probability,  pq- y  C:  we  could  make  one 

i ;  r:'.  ? 

Progress _ !_  2 _ ‘  _  4 _ s 

fkiq _ / 

r  ;M  R-.-  -Rc;-'  F-:' 


."'C; 
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unit  of  progress  by  time  3  (with  probability  3pq-)  and 
then  make  no  progress  for  the  remaining  time  unit 
(total  probability  3pq3).  The  sum  of  these  options  is 
4pq3. 

The  expected  progress  after  N  time  units  is 
cNp  and  the  variance  is  cNpq.  If  p  =  q  =  .5  then  the  dis¬ 
tributions  of  progress  in  each  time  interval  are  symmet¬ 
ric.  Otherwise  the  mass  of  the  distribution  at  time  N 
tends  toward  cN  (if  p  >  q)  or  zero  (if  q  >  p).  Important 
characteiisucs  of  this  model  are  that  progress  is  linear 
and  variance  changes  linearly  with  nme. 

5.2  Utility  Contours  Using  the  Model 

This  model  explains  the  shape  of  utility  contours  and 
slack  time  envelopes,  and  it  predicts  the  probabihty  of 
false  positives  for  a  given  envelope.  Let  us  elaborate 
the  model  a  little:  Our  goal  is  to  travel  some  distance 
Dg  by  a  deadline  time  T^.  At  any  time  t,  we  can  assess 
the  progress  that  has  been  made,  D(t),  and  the  progress 
that  remains  to  be  made,  DR(t);  and  the  time  remaining, 
TR  =  Tg-t.  A  success  is  defined  as  D(t)  S  Dg  and 
t  <  Tg.  The  conditional  probability  of  a  success  given 
DR(th  Dg  and  Tg  is: 

TR 

Pr(5ucceis|  DR(r))  =  X 

r  =  DR{t) 

A  similar  equation  holds  for  the  conditional 
probability  of  a  failure.  If  Payoff(t)  is  for  example  con¬ 
stant.  this  means  means  that  the  ratio  of  these  condi¬ 
tional  probabilities  must  be  constant  as  well.  Now 
imagine  that  we  have  DR(t)  distance  remaining  at  time  t 
and  we  extrapolate  forward  TR  time  units  to  the  dead¬ 
line.  At  this  point  we  have  a  binomial  distnbution 
with  N  =  TR,  divided  into  a  portion  below  the  DR=0 
line  (the  successes,  those  cases  that  have  arrived  by  the 
deadline)  and  a  portion  above  the  line  (the  failures.)  The 
ratio  of  the  areas  of  the  two  portions  gives  us  the  ratio 
of  the  conditional  probabilities.  If  we  want  to  find  at 
each  time  the  distance  for  which  this  ratio  is  coastant, 
we  plot  a  constant  z-score  for  dislnbuuons  with  N  rang¬ 
ing  from  Tj,  to  0. 

Figure  6  shows  contours  for  constant  Pay¬ 
off!  t).  Contours  for  comparable  linear  Payoffit)  are  very 
similar,  with  identical  slack  times,  but  more  pro¬ 
nounced  curve.  To  generate  the  figure  we  assumed  Dc  = 
25,  Tg  =  .50.  p  =  .5,  and  c  =  1,  and  applied  the  above 
analysis  to  get  conditional  probabilities  of  succe.ss  and 
failure  for  esery  '..iliie  oft. 

lin.igme  itiai  a  vehicle  has  made  10  units  of 
progress  at  time  2."^.  ih.ii  is,  I;K(25)  =  1.5,  illiistraied  by 
the  large  dot  near  the  cenier  of  Figute  ti.  Becau.se  this 
(lot  lies  on  tl.e  ciiiiiour  l.ihellcd  I\i\Cj'fti)  =  5,  we  know 
that  J’rffailure  I  DK(:5)  =  15)  /  J’rt.succc.ss  I  DR(25) 
=  15)  =  5.  II  the  seliicle  makes  no  progiess  lor  another 


time 

DR  0  10  20  30  40  50 


Figure  6.  Contours  of  constant  payoff  from  each  point  in 
the  space. 

five  time  units,  then  the  dot  would  lie  to  the  right  of 
the  contour  labeled  Payoff(t)  =  43,  so  the  probability 
ratio  IS  much  higher. 

These  contours  vary  as  \  r .  At  the  scale  on 
which  we  monitor,  linear  envelopes  provide  a  good 
approximanon  of  the  contours,  as  long  as  the  envelope 
boundanes  have  the  right  slope,  that  is.  if  they  are  con¬ 
structed  with  the  nght  amount  of  slack  time.  Note, 
too,  that  Figure  6  justifies  the  use  of  slack  time  in 
envelopes:  The  contours  assoCTated  with  high  payoffs 
(and  thus  high  ratio  of  hit  probability  to  false  positive 
probability)  allow  a  penod  of  no  progress  at  the  begin¬ 
ning  of  the  plan. 

5.3  Setting  Slack  Time 

A  slack  time  envelope  is  just  a  pair  of  lines,  one  repre¬ 
senting  the  penod  in  which  no  progress  is  requued — the 
slack  lime — and  another  connecting  the  end  of  ib.e  ti.'st 
to  the  deadline,  as  shown  in  Figures  1  and  6.  Slack 
time  is  the  only  parameter  in  slack  ume  envelopes,  but 
we  must  still  show  how  to  set  it. 

We  desire  a  b.alance  of  false  posiiives  agairLsl 
early  warning  premiums.  We  have  not  yet  denved  from 
our  binomial  model  a  closed-fonn  expression  tor  the 
expected  number  of  false  positives  ;uid  early  warnings, 
but  we  have  an  algorithm  that  pnvluces  theve  extveta- 
tions  for  a  eiven  value  of  sl.uk  lime,  if  we  aasiinie  it.at 
Dg  =  5  T.^: 

For  each  posciLle  value  ol  DR.  dr,; 
a.  calculate  t^.  the  time  at  wh.'cn  t.he  t  n.olcce 
boundary  will  be  crossed,  given  dr,;  tor 
example,  m  Figure  1,  when  dr,  =  5  ana  t>S, 
the  solid  envelope  boundary  is  crossed, 
so  lor  dr,  5,  te  =  8. 


r  (TR~r) 
r)”" 
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b.  use  the  binomial  model  to  calculate  pe,  the 
probability  of  reaching  te;  for  example  if 
dn  =  3  and  te  =  5,  Table  1  tells  us  that 

Pe  = 10p2q3. 

c.  use  the  model  to  find  the  probability  of  a 
false  positive,  pfp  =  Pr(success  |  DR(te)  = 
dr,). 

d.  pe  X  Pfp  is  the  probability  of  a  false  positive 
for  this  value  of  dr, 

e.  Pe  X  (Tg  -  te)  is  the  expected  early  warning 
premium  for  this  value  of  dr,. 

Tg  -  te  is  the  time  that  remains  before  the 
deadline  at  the  envelope  boundary  at  drj;  this  is  why  Tg 
-  te  is  called  the  early  warning  premium.  The  expected 
early  w  arning  premium  for  a  value  of  drj  is  just  Ta  -  te 
times  the  probability  of  crossing  the  envelope 
boundary.  The  mean  expected  early  warning  premium  is 
the  mean  over  all  values  of  dr,  of  pe(Tg  -  te).  We  expect 
it  to  have  higher  values  for  lower  slack  times,  because 
the  envelope  boundaries  for  low  slack  times  are  funher 
from  the  deadline.  The  mean  probability  of  false 
positives  is  obtained  by  summing  pe  pfp  for  all  values 
of  dr,  and  dividing  by  the  number  of  these  values.  We 
expect  It  to  nse,  also,  as  slack  time  decreases,  as 
suggested  by  the  contours  in  Figure  6. 

With  a  table  of  values  for  the  mean  probability 
of  fal.se  positives  and  the  mean  expected  early  warning 
premium,  and  utilities  for  early  warning  and  false  posi¬ 
tives,  we  can  make  a  rational  decision  about  slack  ume. 

6  Conclusion 

Although  we  rely  heavily  on  slack  time  envelopes  in 
the  Phoenix  planner,  we  have  always  consiructed  them 
by  heuristic  cntena,  .and  we  did  not  know  how  to  evalu¬ 
ate  their  performance.  In  this  paper  we  showed  that 
high  perform.ance  can  be  achieved  by  hand-con.struaed 
slack  time  envelopes,  and  we  presented  a  probabilistic 
model  of  progress,  from  vshich  we  denved  a  method  for 
automatically  constructing  slack  time  envelopes  that 
bal.ance  the  benefits  of  c.arly  warnings  agaiast  the  costs 
of  false  positives. 

Other  work  has  been  done  in  this  area,  c.g., 
(Miller,  1989)  constructs  an  execution  monitonng 
profile  of  acceptable  ranges  of  sensor  values  for  a 
mobile  robot  (this  profile  is  also  called  .an  "envelope"). 
If,  dunng  plan  executiem,  a  ,sensor  value  exceed-,  the 
envelope  boundaries,  a  rellex  is  tnggered  to  3i!|ir,i  the 
robot  s  K'hav  ior  in  such  a  w  ay  th.it  the  sensor  readmes 
return  to  the  acceptable  r:ef/e.  (Sanborn  and  Hendl.-r. 
lOKS)  have  used  monitonng  and  projection  in  a 
simulated  lobol  that  tries  to  cross  a  busy  street.  The 
robot  ha.s  a  b.r-.ic  .Mieet-crossing  jilan,  but  iiii  nitois 
oncoming  Iralfic  and  predicts  pos.sible  collision  [-omis 
which  trigger  reactive  avoidance  actions.  (iiir 
coiiiribulion  has  been  to  cast  the  problem  in 


probabilistic  terms  and  to  develop  a  framework  for  eval¬ 
uation.  We  are  currently  extending  our  work  to  other 
models  of  progress  and  different,  more  complex 
domains.  A  technical  repon  covenng  this  work  in  more 
detail  is  in  preparatioa 
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5  Constructing  an  Envelope  without  a  Model 

An  abridged  version  of  this  paper  will  appear  in  the  Proceedings  of  the  AAAI-93  Workshop  on 
Learning  Action  Models. 


Learning  a  Decision  Rule  for 
Monitoring  Tasks  with  Deadlines 


Eric  A  Hansen  and  Paul  R.  Cohen 


Abstract 

A  real-time  scheduler  or  planner  is  responsible  for  managing  tasks  with  deadlines. 
When  the  time  required  to  execute  a  task  is  unceitain.  it.  may  be  useful  to  monitor  the 
task  to  predict  whether  it  \sall  meet  its  deadlme;  this  pro\ades  an  opportunity  to  m.ake 
adjustments  or  else  to  abandon  a  task  that  can  t  succeed.  This  paper  treats  momtorm:; 
a  task  evith  a  deadline  as  a  sequential  decision  problem.  Given  an  explicit  model  of  ta^k 
execution  time,  execution  cost,  and  payoff  for  meeting  a  deadline,  an  optimal  decision 
rule  for  monitoring  the  task  can  be  constructed  using  stochastic  dynamic  programmini:. 
If  a  model  is  not  available,  the  same  rule  can  be  learned  using  temporal  diiference 
methods.  These  results  are  significant  because  of  the  importance  of  this  decision  ru.v  ii. 
real-time  computing. 
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1.  Introduction 

In  hard  real-time  systems,  tasks  must  meet  deadlines  and  there  is  little  or  no  value  in 
completing  a  task  after  its  deadline.  WTien  the  time  required  to  execute  a  task  is  uncer¬ 
tain,  it  may  be  useful  to  monitor  the  task  as  it  executes  so  that  failure  to  meet  its  dead- 
hne  can  be  predicted  as  soon  as  possible.  Anticipating  failure  to  meet  a  deadline  pro¬ 
vides  an  opportunity  to  adjust:  either  to  initiate  a  recover^'  action,  or  simply  to  abandon 
a  failing  task  to  consen,-e  limited  resources. 

A  system  that  monitors  ongoing  tasks  needs  a  decision  rule  to  predict  whether  a  task 
wall  meet  its  deadline.  The  Phoenix  planner  uses  a  decision  rule  for  this  called  an 
"envelope"  (Hart,  Anderson,  &  Cohen,  1990)  which  defines  a  range  of  expected  perfor¬ 
mance  over  time.  Envelopes  are  represented  by  a  two-dimensional  graph  that  plots 
progress  toward  completion  of  a  task  as  a  function  of  time;  for  example,  in  Figure  1  the 
shaded  region  represents  an  envelope.  When  the  actual  progress  of  a  monitored  task 
falls  outside  the  envelope  boundary,  failure  to  meet  the  task's  deadline  is  predicted  and 
an  appropriate  action  is  taken. 


distance 

remaining 


0  10  ZO  30  40  50 

time  remaining 


Figure  1.  A  Plioeni.x  envelope.  The  arrow  represents  the  trajectory  of  a  task 
ViTien  it  falls  outside  the  shaded  region,  a  recover^'  action  is  initiated. 


Monitoring  progress  to  anticipate  failure  to  meet  a  deadline  is  useful  in  o'Jner  cunte.xts 
besides  monitoring  plans.  For  example,  real-time  problem-solvers  monitor  thetr 
progress  to  determine  whether  to  adjust  their  strategies  to  make  sure  a  solution  is 
ready  by  a  deadline  (Durfee  ^Lesser,  1988;  Lesser,  Pavlin,  &  Durfee,  IPrSV  Dynamic 
schedulers  for  real-time  operating  systems  monitor  task  execution  .=  o  tliey  ec.n  .intici- 
pate  failure  to  meet  a  deadline  in  lime  to  abort  a  failing  task  and  cor.-er.--  r-  -ounces 
(Ilaben  &  Shin.  1990). 


De‘-f)ite  the  i;;an\’  c  ii.t'-xt  s  in 
pri.'K'ipled  ue-thod  i.as  vet  i.ee 
IS,  riih's  to  ,ari-'iu'l  '.'.hi'ther  a 
1 ’iioomix  piaiii.er  \V(-ro'  iiaiutor, 


inch  it  is  usi-ful  to  monitor  t.i-o.s  •.'.ety.  o.’.  i 
d'-velop(-d  ior  constructing  optimal  cieci-a  r. 
-k  will  mi-el.  a  deadiine.  Tiie  i  n\-i  !oi  .■>  a  ; 
ted  and  adjusted  by  trail  ami  rin  r  to  t:.' 
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well.  Similar  decision  rules  used  in  other  systems  have  also  been  heuristic  or  con¬ 
structed  in  an  ad  hoc  way. 

This  paper  describes  ’"ow  an  optimal  decision  rule  can  be  constructed.  As  an  example,  it 
works  through  the  construction  of  a  rule  for  the  simple  case  in  which  the  only  option  for 
a  failing  task  is  to  abandon  it.  When  recovery  options  are  available  a  more  complex 
decision  rule  is  called  for,  but  constructing  it  is  a  straightforward  extension  of  the  meth¬ 
ods  described  here  for  constructing  the  rule  in  its  simplest  form. 

Monitoring  a  task  over  the  course  of  its  execution  requires  a  sequence  of  related  deci¬ 
sions;  each  of  these  decisions  is  not  whether  to  continue  the  task  to  the  end,  but  whether 
to  continue  it  a  Little  longer  before  monitoring  and  reconsidering  whether  to  continue  it 
further.  A  task  should  be  continued  as  long  as  the  expected  value  of  continuing  it  until 
the  next  monitoring  action  is  greater  than  the  expected  value  of  abandoning  it. 

Knowing  the  expected  value  of  continuing  a  task  until  it  is  monitored  again  requires 
knowing  the  expected  value  of  continuing  it  after  that,  and  so  on  recursively  until  the 
deadline.  In  this  sense,  constructing  an  optimal  decision  rule  is  a  sequential  decision 
problem. 

Section  2  describes  how  to  use  dynamic  programming  to  unravel  this  recursive  relation¬ 
ship  backwards  from  the  deadline.  Using  dynamic  programming  to  construct  the  deci¬ 
sion  rule  requires  an  explicit  model  of  probable  task  execution  time,  execution  cost,  and 
the  payoff  for  meeting  the  deadline.  Section  3  describes  machine  learning  techniques 
that  can  learn  the  decision  rule  if  a  model  is  not  a^'ailable.  Section  4  discusses  the  gen¬ 
erality  of  these  results  and  possible  extensions. 

2.  Constructing  the  Decision  Rule  Using  Dynamic  Programming 

Dynamic  programming  is  an  optimization  technique  for  soKhng  problems  that  require 
making  a  sequence  of  decisions.  A  decision  rule  for  monitoring  task  execution  can  be 
constructed  using  dvaiamic  programming  by  assuming  a  task  is  monitored  at  discrete 
time  steps;  any  interval  of  time  can  correspond  to  a  time  step. 

The  decision  problem  is  formalized  as  follows.  A  state  is  represented  by  two  var.ables, 
(r,d),  where  r  is  a  non-negative  integer  that  represents  the  number  of  time  steps 
remaining  before  the  deadline  and  d  is  a  non-negative  integer  that  rcprese.nts  the  r  art 
of  the  task  ^the  "distance")  that  remains  to  be  completed.  (This  presupposes  some  unit 
of  progress  in  terms  of  which  completion  of  a  task  can  be  measured.)  The  decision  to 
continue  a  task  or  abandon  it  is  represented  by  a  binary’  decision  wariable, 
a  e  [comiriuc.  s!op] . 


The  amour.;  o:  a  ta-'k  likely  to  be  executed  in  one  time  step  is  rep:  •■.■■e:. tod  a  t 
•State  trar-utio-n  probabilities,  f,, j,;,., ■  In  this  not.il :on.  tiie  ;'.r.  t  . 

{t.d),  i.s  t.re  state  at  the  bi'ginning  of  the  time  step  and  the  secoiiii  subscript. 

(t  “  l.o'  -  el.  i.s  tne  state  ;iL  the  end  of  the  time  step,  wliere  k  is  the  nurn’oer  o:  ■..r..t 
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the  task  completed  during  the  time  step.  The  argument  is  the  action  taken  at  the 
beginning  of  the  time  step. 

In  addition  to  this  stochastic  state-transition  model,  a  function,  R{t,d,a),  specifies  the 
single-step  payoff  for  action,  a,  taken  in  state,  (t,d).  The  execution  cost  per  time  step  is 
R{t,d, continue)',  while  R{t,d,stop)  is  the  value  for  finishing  a  task  by  its  deadline  (when 
t>0,d  =  0)  OT  zero  if  the  deadline  is  not  met  (t  =  0,d  >1). 

The  objective  is  to  maximize  the  expected  value  of  the  sum  of  the  single-step  payoffs  for 
the  course  of  a  task.  The  difficulty  is  that  the  payoff  for  finishing  a  task  by  its  deadline 
is  not  received  until  the  end  of  the  task,  although  it  must  be  considered  in  making  ear¬ 
lier  decisions  about  whether  to  continue  the  task.  Dynamic  programming  solves 
sequential  decision  problems  with  "delayed  payoff  by  constructing  an  evaluation  func¬ 
tion  that  prorides  secondary  reinforcement',  the  key  idea  is  that  expected  cumulative 
value  over  the  long  term  is  maximized  by  choosing,  at  each  step,  the  action  that  maxi¬ 
mizes  the  evaluation  function  in  the  short  term. 

The  evaluation  function  for  this  decision  problem  is  expressed  by  the  recurrence 
relation 


y{t,d)  =  m3x^R{t,d,stop),  -  \,d  -  k)\-¥  Rit.d, continue)'^ 

Expanding  the  term  for  the  expected  value  of  the  next  state,  this  becomes: 

\'{t,d)  =  max  5/op),  ^[P,,  j.*)(co/J/m/<e)  •  V{t  -  l.^f  -  i  j]  -f-  R{t ,d , continue) 

D^mamic  programming  systematic/dly  evaluates  this  recurrence  relation  to  fill  in  a  table 
of  values.  V{t,d),  that  represents  the  evaluation  function. 

In  the  course  of  evaluating  this  recurrence  relation,  the  optimal  action  in  each  state 
(whether  to  stop  or  continue)  is  determined.  The  decision  rule  is  defined  implicitly  by 
the  evaluation  function,  as  follows. 


j('(W/mi/c  if  V(t,d)>0 

'  \-^‘'<’P  othcnvisc 

That  is.  continue  if  the  expected  value  of  continuing  exceeds  i-erc,  the  value  l\>r  sti  pning. 
(In  the  tcrni.nolotrv’  of  dynamic  programming,  a  decision  rule  ;>  .il.-^o  rcfi-i  red  t.i  as  a  i  v.i- 
icy  function  ■ 


'I'his  forni.n 
I'/ac'cs  tne 
they  a.-e  ht 
probaL.iitiv 

the  taoh:  tn 


notation  describes  the  decision  rule  in  its  mo-t  form.  1  low  i'Vit  it 

t.'.te  tran.sition  probabilities  and  costs  used  to  cte-;.:  ute  the  rule  um  pei  .  d; 
t  ti  the  decision  proiilein  being  modeled.  'I'here  are  two  ways  to  obtain  the'c 
'  and  costs.  One  is  to  specify  them  given  lamie  e.\ene;iou.s  knowlcdee  aaeut 
e  otii'T  is  to  learn  them  automat icallv.  We  v.ih  c  a  r  tliese  in  tui  m 
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For  the  sake  of  having  an  example  to  work  through  in  this  paper,  we  make  the  following 
arbitrguy  assumptions.  Each  time  step  is  regarded  as  a  single  Bernoulli  trial.  This 
makes  the  state  transition  probabilities: 

=l-p 

where  p  is  the  probability  of  "success"  in  a  BemoulU  trial.  Completing  a  task  of  size  d 
is  eqmvalent  to  succeeding  in  d  Bernoulli  trials,  and  completing  the  task  of  size  d  in 
time  t  is  equivalent  to  succeeding  in  d  out  of  t  trials.  So  task  execution  time  is  bino- 
mially  distributed  with  a  mean  and  variance  proportional  to  time. 

Because  a  binomial  distribution  is  approximately  normal  when  /p(l  —  p)  >  10 ,  and 
because  it  seems  likely  that  in  many  cases  task  execution  time  is  approximately  nor¬ 
mally  distributed,  this  is  a  plausible  model.  The  mean  and  variance  of  the  binomial 
probability  function  can  be  fit  to  the  actual  mean  and  variance  of  task  execution  time  by 
choosing  the  value  of  p  so  that  the  equation, 

p  =  1  -  Variance{execution  time)  j  Mea>i[execution  time) 

is  satisfied,  and  by  choosing  the  scale  of  the  distance  step  so  that  the  mean  proportion  of 
a  task  executed  in  one  time  step  is  p- (distance  step). 

As  a  payoff  function,  we  assume: 

R{0,d,stop)  =  0  ford>\ 

R{r,0,srop)  =  R  for  r>0 
R{t,d, continue)  =  E  for  t.d>\ 

where  R  is  the  reward  for  finishing  a  task  by  its  deadline  and  E  is  the  execution  cost 
per  time  step.  Like  the  binomial  state-transition  model,  this  parameterized  payoff  func¬ 
tion  is  very  general  and  likely  to  fit  many  situations.  However  it  too  is  only  an  example; 
any  payoff  function  could  be  used. 

Given  these  state-transition  probabilities  and  payoffs,  the  evaluation  function  for  this 
decision  problem  has  the  follo\sdng  simple  recursive  definition: 

V{Q,d)  =  Q  for  d>\ 

V{t,Q)=R  for  f>0 

V{t,d)  =  m:L\{0.  p • !’(/  -  1.  J  -  1)  -(-(1  - p)  ■  V'(;  -  1.  J)  +  /t}  f>r  t.J  >  1 

The  evaluation  function  for  parameter  values,  p  =  0  5.  /'  —  100  and  —  1.  i.-^  .'.hown  m 
Figure  2. 
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Figure  2.  Evaluation  function  computed  by  dvmamic  programming. 


The  decision  rule  is  represented  implicitly  by  the  evaluation  function,  since  a  task  is 
continued  as  long  as  the  expected  value  of  the  current  state  is  greater  than  zero. 

In  the  following  graph  the  decision  rule  is  projected  onto  two  dimensions  and  extended 
out  to  a  starting  time  of  200  before  the  deadline:  this  shows  that  for  >  50  ii  is  ret 
worth  even  beginning  the  task  because  the  expected  cost  of  completing  the  task  is 
greater  than  the  potential  reward  for  finishing  it. 

d 

50 
40 
30 
20 
10 

50  100  150  200 

Figure  3.  Representation  of  decision  rule  computed  by  d>Tinmic  programming. 

If  the  trajectoig.'  of  a  task  goes  above  the  line,  the  task  is  abandoned. 


The  shape  of  this  decision  rule  is  strilzingly  similar  to  the  shape  of  the  envelopes  cc  r. - 
structed  by  tritU  and  error  and  used  by  the  Phoenix  planner. 

3.  Learning  the  Decision  Rule  Using  Temporal  Difference  Methods 

WTien  an  accurate  model  of  the  state  transition  probabiltlies  and  cti.^ts  ts  avail  »bl  ■.  ..  a. 
optimal  decision  rule  can  be  computed  .')ff-Iine  using  dynamic  programming.  M'la  :.  ,.a. 
accurate  mociel  i.s  not  avmi.ible.  however,  t.here  are  sf.ll  maciiine  learning  na  ::,  n  v 
can  gradually  adapt  a  deci.Tion  rule  until  it  converg'ea  to  <<r.e  that  is  optimal.  i  ; 
temporal  difference  tTIf  .)  method.*-  i  Barto,  Sutton,  >.V:  Wa.t '.nils,  1',',  they  • 

tailed  temporal  credit-a;  agninent  prohlem"  inherent  in  seuiienlial  decision  ;  ^  . 

willi  delayed  payolf  liy,  in  effect,  appruximatitig  dynamic  pre.gramnung. 
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It  can  be  shown  that  the  reciirrence  relation  for  an  evaluation  function  constructed  by 
dynamic  programming  only  holds  true  for  a  decision  rule  (i.e.,  policy)  that  is  optimal. 
This  provides  the  basis  for  TD  methods.  TD  methods  learn  an  evaluation  function  in 
addition  to  a  decision  rule.  In  our  monitoring  example,  the  recurrence  relation  that 
defines  the  evaluation  function  is 

V{t,d)  =  -\.d  -  A')]  +  R[r,d, continue) 

Any  measured  difference  during  training  between  the  values  of  the  two  sides  of  the 
equation  is  treated  as  an  "error"  that  is  used  to  adjust  the  decision  rule.  The  TD  error 
measured  after  each  time  step  is: 

V  (t,d)~  V{t  —  l,d  —  A)  —  R(t,d, continue) 

The  difference  in  this  definition  of  TD  error  and  the  recurrence  relation  for  the  evalua¬ 
tion  function  is  that  the  expected  value,  £[V'(f  —  \,d  —  A)],  is  replaced  by  \'(f  - \,d  -  A) . 
This  is  necessary  because  the  expected  value  cannot  be  computed  w-ithout  a  model  of  the 
state-transition  probabilities.  However  TD  training  works  because  the  learned  value  of 
each  state  regresses  towards  the  weighted  average  of  the  values  of  its  successor  states, 
where  the  weightings  reflect  the  conditional  probabilities  of  the  successor  states.  So  in 
the  limit,  the  value  of  each  state  converges  to  the  expected  value  of  its  sucessor  states 
plus  the  single-step  payoff,  and  hence  converges  to  the  expected  cumulative  value. 

By  adapting  the  decision  rule  to  minimize  the  TD  error,  an  optimal  decision  rule  is 
gradually  learned  along  with  an  evaluation  function.  Adjusting  the  decision  rule 
changes  the  evaluation  function,  which  in  turn  serves  as  feedback  to  the  learning  algo¬ 
rithm  for  continued  adjustment  of  the  decision  rule;  the  two  are  adapted  simultane¬ 
ously.  In  most  cases  of  TD  learning,  the  decision  rule  and  evaluation  function  must  be 
represented  separately.  However  this  example  is  particularly  straightfon.vard  because 
the  simple  relationship  between  the  two  — the  decision  rule  is  defined  by  a  simple 
threshold  on  the  evaluation  function—  makes  it  possible  to  represent  both  by  the  same 
function. 

A  complication  is  that  learning  takes  place  only  as  long  as  a  task  is  continued,  and  so 
only  inside  the  "boundan,’"  of  the  decision  rule.  If  this  boundan,'  is  inadvertently  set  too 
ccnsers'atively,  it  cannot  be  unlearned  unless  a  task  is  occasionally  continued  from  a 
state  outside  the  boundary'  to  see  what  happens.  This  is  characteristic  of  trial-and-error 
learning;  occasionally  actions  that  appear  suboptimal  must  be  taken  so  that  the  relative 
merits  of  actions  can  be  assessed.  Tliis  is  managed  by  including  a  stoch.u^fic  elenii-nt  in 
the  decision  rule,  for  example; 

_  jconm.’.'a’ [\'(t,(l)  +  {i\nh!on:{2A'))  -!));■() 

1^  .'Jop  (/[iicrw.  !sc 


Tr.e  uecision  to  use  TD  methods  for  training  is  indejiendent  of  the  dec;  lun  alnait  <vii.it 
t'uii.f.on  representation  and  learning  algorithm  to  use.  We  .'-how  thi.^  by  (i^-cnbing  two 
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different  represcrtations  for  the  evaluation  function  and  two  different  learning 
algorithms. 

3.1  Table  Representation  and  Linear  Update  Rule 

If  we  represent  the  evaluation  lanction  by  a  two-dimensional  table,  as  %ve  have  for 
dynamic  programming,  then  the  values  in  the  table  can  be  adjusted  by  the  follounng 
learning  rule: 

:=  V(t,cl)  +  a  error  V(t,d) 

This  learning  rule  in  .-rements  or  decrements  the  value  of  the  current  state  by  an 
amount  proportional  to  the  TD  error,  defined  as; 

error  =  R{t ,d ,continue)  +  V [t  -  \,d  - k)-V(t.d) 

as  well  as  proportional  to  a  learning  rate,  cc  (in  this  example,  set  to  0.1 1.  This  linear 
learning  rule  minimizes  the  TD  error  by  gradient  descent. 

A  training  regimen  consists  of  repeatedly  starting  a  task  from  a  random  state,  (.',h), 
and  contmuing  it  until  it  finishes  or  is  abandoned;  each  ta.‘-k  counts  as  a  learning  tri;: i. 
For  tlie  purpose  of  generating  a  learning  curve,  performance  is  measurea  by  comparing 
tiie  evaluation  function  computed  by  stochastic  dynamic  programming  to  the  table  of 
values  learned  liy  TD  training  and  measuring  the  mean  square  error.  Thi.s  cmr.parison 
g:%x-s  rise  to  the  learning  cur\'e  shown  in  Figure  4. 

total  rr.ea.t  Eqjare  orrtr 


Figure  -1  D-.imiru;  curve  for  TD  ir.tuaii:',  i.-iri::  ,< 
t.il.’v  r-'i'T*-. i-nt.ili'.n  ,irid  liii'-  ir  upriit^-  ro.':" 
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Figure  5.  Evaluation  function  learned  by  TD  training, 
using  a  table  representation  and  linear  update  rule. 


3.2  Connectionist  Representation  and  Error  Backpropagation  Rule 

The  problem  with  representing  the  evaluation  function  by  a  table  is  that  :t  requires 

storage,  where  n  is  the  number  of  time  steps  from  the  start  of  the  task  to  its  dead¬ 
line.  A  more  compact  function  representation  would  t  be',’ ;  ,  although  it  still  must  be 
capable  of  representing  a  nonbnear  function.  One  poi..  .b.;  •  is  a  feedfor.vard  connec¬ 

tionist  network  trained  by  the  error  backpropagation  rule.  The  simplest  network  pos¬ 
sible  for  this  problem  is  a  single  neuron  with  two  inputs  and  a  sigmoid  .-.ctivation  func¬ 
tion.  It  corresponds  to  the  formula 


V{i,d) 


1  -he  ' 


where  w, ,  u\,  and  u',  are  the  learned  weights.  Tliis  simple  rcpresentr.ti en  turns  out  to 
work  .surpri.smgly  well.  Trained  by  temporal  differences  using  backprc  p.ication.  the 
learning  curve  in  Figure  6  illustrates  that  it  converges  many  times  faster  tiian  when  a 
table  rejiresenation  and  linr'ar  update  rule  are  used. 


I  I  :'u 


n  i  V  •  i  j  <  : 
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Figure  7.  Evaluation  function  learned  by  TD  training 
of  a  single  neuron  by  error  backpropagation. 


The  reason  convergence  is  faster  is  that  the  parameterized  function  represented  by  the 
neuron  miikes  generalization  possible;  in  addition,  it  is  able  to  closely  approximate  the 
optimal  evaluation  function.  The  learned  evaluation  function  is  shou-n  m  Figure  7. 

Besides  being  more  space-efficient,  a  cormectionist  network  also  has  the  advantage  of 
being  able  to  represent  a  continuous  evaluation  function,  instead  of  the  discrete  function 
presupposed  by  a  table  representation.  It  allows  for  generaUzation  as  well  because  the 
possible  input  values  are  not  limited  by  the  cize  of  the  table. 

4.  Discussion 


This  paper  treats  monitoring  tasks  with  deadlines  as  a  sequential  decision  problem, 
which  makes  available  a  class  of  methods  based  on  dynamic  programming  for  construct¬ 
ing  a  decision  rule  for  monitoring.  WTien  an  explicit  model  of  the  state-transition  prob¬ 
abilities  and  costs  is  available,  the  rule  can  be  constructed  off-line  using  ,stocha.stic 
dynamic  programming.  Otheiasase  it  can  be  learned  on-line  u.sing  TD  methods  that 
approximate  dynamic  programming. 


It  makes  sense  to  construct  a  decision  rule  such  as  the  one  described  in  this  paper  for 
tasks  that  are  repeated  many  times  or  for  a  class  of  la.sks  with  tire  same  behavior.  Thns 
allows  tlie  rule  to  be  learned,  if  TD  methods  are  relied  on;  or  for  .stati.^-tic.s  to  be  gathered 
to  characterize  a  probability  and  cost  model,  if  dN-namic  programming  is  relied  on. 
However  if  a  model  is  knoNsm  beforeh.ind,  or  can  be  estimated,  a  deci  ion  rule  can  .d-o 
be  constructed  for  a  ta.sk  that  executes  only  once. 


The  tune  complexity  of  the  dvn.imic  pr 
number  of  time  :-t"p:-  from  ih.e  start  oft 
rule  m.iv  re  cimijna'd  once  and  reii.-'-ii  f 
learning,  (^"1,  i-S  mil, gal. al  liy  the  ;  oss 
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overhead  of  representing  an  evaluation  function  by  a  table  is  avoidable  by  using  a  more 
compact  function  representation,  such  as  a  connectionist  network. 

Besides  the  fact  that  the  approach  described  in  this  paper  is  not  computationally  inten¬ 
sive,  it  has  other  advantages.  It  is  conceptually  simple.  The  decision  rule  it  constructs 
is  optimal,  or  converges  to  the  optimal  in  the  case  of  TD  learning.  It  works  no  matter 
what  probability  model  characterizes  the  execution  time  of  a  task  and  no  matter  what 
cost  model  applies,  and  so  is  extremely  general.  Finally,  it  works  even  when  no  model  of 
the  state  transition  probabUites  and  costs  is  available,  although  a  model  can  be  taken 
advantage  of. 

These  results  can  be  extended  in  a  couple  obvious  ways.  The  first  is  to  factor  in  a  cost 
for  monitoring.  In  this  paper  we  assume  monitoring  has  no  cost,  or  its  cost  is  negligible. 
This  allows  monitoring  to  be  nearly  continuous,  in  effect,  for  a  task  to  be  monitored  each 
time  step.  Others  who  have  developed  similar  decision  rules  have  also  assumed  the  cost 
of  monitoring  is  negligible.  However  in  some  cases  the  cost  of  momtoring  may  be  signif¬ 
icant,  so  in  another  paper  we  show  how  this  cost  can  be  factored  in  (forthcoming).  Once 
again  we  use  d^mamic  programming  and  TD  methods  to  develop  optunal  monitoring 
strategies. 

The  second  way  in  which  this  work  can  be  extended  is  to  make  the  decision  rule  more 
complicated.  In  this  paper  we  analj'zed  a  simple  example  in  which  the  only  alternative 
to  continuing  a  task  is  to  abandon  it.  But  recovery  options  may  be  available  as  well.  A 
dNuamic  scheduler  for  a  real-time  operating  system  is  unlikely  to  have  recoverv'  options 
available,  but  an  AI  planner  or  problem-solver  is  almost  certain  to  have  them  i  Lesser, 
Pavlin  &  Durfee,  198S;  Howe,  1992).  The  way  to  handle  the  more  complicated  decision 
problem  thus  poses  is  to  regard  each  recovery  option  as  a  separate  task  characterized  by 
its  owm  probability  model  and  cost  model;  so  at  any  point  the  expected  value  of  th  _■ 
option  can  be  computed.  Then  instead  of  choosing  between  two  options,  either  continu¬ 
ing  a  task  or  abandoning  it,  the  choice  includes  th'^  recovonv’  options  as  well.  The  rule  is 
simply  to  choose  the  option  wath  the  highest  expected  value. 
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6.  Building  Causa!  Models  of  Planner  Behavior  using  Path  AasJvsJs 

Thi*  paper  appeared  in  the  Proceeii  j^s  of  the  Firtt  International  Conference  on  AI  Planning  Sytteme. 


Predicting  and  Explaining  Success  and  Task  Duration 
in  the  Phoenix  Planner 

David  M.  Hart  and  Paul  R.  Cohen 


Abstract 

Phoenix  is  a  multi-agent  planning  system  that 
fights  simulated  forest-fires.  In  this  paper  we 
describe  an  experiment  with  Phoenix  in  which 
we  uncover  factors  that  affect  the  planner's  behav¬ 
ior  and  test  predictions  about  the  planner's 
robustness  against  variations  in  some  of  these 
factors.  We  also  introduce  a  technique — path 
analysis — for  constructing  and  testing  causal 
explanations  of  the  planner's  behavior. 

1  INTRODUCTION 

It  is  difficult  to  predict  or  even  explain  the  behavior  of  any 
but  the  simplest  AI  programs.  A  program  will  solve  one 
problem  readily,  but  make  a  complete  hash  of  an 
apparently  similar  problem.  For  example,  our  Phoenix 
planner,  which  fights  simulated  forest  fires,  will  contain 
one  fire  in  a  matter  of  hours  but  fail  to  contain  another 
under  very  similar  conditions.  Wc  therefore  hesitate  to 
cuiii.:  that  the  Phoenix  planner  "works."  The  claim  would 
not  be  very  informative,  anyway:  we  would  much  rather 
be  able  to  predict  and  explain  Phoenix's  behavior  in  a  wide 
range  of  conditions  (Cohen  1991).  In  this  paper  wc 
describe  an  experiment  with  Phoenix  in  which  wc  uncover 
factors  that  affect  the  planner's  behavior  and  test 
predictions  about  the  planner's  robustness  against 
variations  in  some  factors.  Wc  also  introduce  a  tech¬ 
nique — path  analysis — for  constructing  and  testing  causal 
explanations  of  the  planner's  behavior.  Our  results  are 
specific  to  the  Phoenix  planner  and  will  r>ot  rteccssanly 
generalize  to  other  planners  or  environments,  but  our  tech¬ 
niques  are  general  and  should  enable  others  to  derive  com¬ 
parable  results  for  ilicmsehcs. 

In  overview,  .Section  2  iiitro<luccs  the  Phiyjiiix  planiicr; 
Section  3  dcscrilx;s  an  experiment  m  whith  wc  idermfy 
factors  that  probably  iiillucncc  the  pLuiner's  luduvior;  ;ind 
Section  4  discusses  results  and  one  sense  m  which  the 


planner  works  "as  designed."  But  these  results  leave  much 
unexplained;  although  Section  4  identifies  some  factors 
that  affect  the  success  and  the  duration  of  fire-fighting 
episodes,  it  does  not  explain  how  these  factors  interact 
Section  5  shows  how  correlations  among  the  factors  t^ait 
affect  behavior  can  be  decomposed  to  test  causal  models 
that  include  these  factors. 

2  PHOENIX  OVERVIEW 

Phoenix  is  a  multi-agent  planning  system  that  fights  sim¬ 
ulated  forest-fires.  The  simulation  uses  terrain,  cievauon. 
and  feature  data  from  Yellowstone  National  Park  and  a 
model  of  fire  spread  from  the  National  W'ildlife  Coordinat¬ 
ing  Group  Fireline  Handbook  (National  Wildlife  Coordi¬ 
nating  Group  1985).  The  spread  of  fires  is  influenced  by 
wind  and  moisture  conditions,  changes  in  elevation  and 
ground  cover,  and  is  impeded  by  n'aural  and  man-made 
boundaries  such  as  rivers,  road'  and  fireline.  The  Fircline 
Handbook  also  presenbes  many  of  the  charactcrisucs  of 
our  firefighting  agents,  such  a.s  rates  of  movement  a.nd 
effectiveness  of  various  firefighting  techniques.  For 
example,  the  rate  at  which  bulldozers  dig  fireline  vanes 
with  the  terrain.  Phoenix  is  a  real-time  simulation  envi¬ 
ronment — Phoenix  agents  must  think  and  act  as  the  fire 
spreads.  Thus,  if  it  takes  too  long  to  decide  on  a  course  of 
action,  or  if  the  envuonment  changes  while  a  decision  is 
being  made,  a  plan  is  likely  to  fail. 

One  Phoenix  agent  the  Fireboss,  eoordinaics  the  feenght- 
ing  activities  of  all  field  agents,  such  as  buildo.’ers  and 
waichtowcrs.  The  Fireboss  is  essentially  a  th.ir.'urg 
agent,' using  rci>ons  from  field  .igents  to  :  rm  and 
maintain  a  global  assessment  of  the  world,  ita^'d  on 
these  rcixirts  (eg.,  r.re  sighiings,  on  iq'.tcos,  x 


'  11  t  e.c  n;*ic  irvliiif.tiirr  ii  icVr  -r-ruj  ..  ‘.ij 
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progress),  it  selects  and  instantiates  fue-righting  plans  and 
directs  field  agents  in  the  execution  of  plan  subtasks. 

A  new  fire  is  typically  spotted  by  a  watchtower,  which 
reports  observed  fire  size  and  location  to  the  Fireboss. 
With  this  inf(XTnation,  the  Fireboss  selects  an  appropriate 
fire-fighting  plan  from  its  plan  library.  Typically  these 
plans  dispatch  bulldozer  agents  to  the  fire  to  dig  fireline. 
An  important  first  step  in  each  of  the  three  plans  in  the 
experiment  described  below  is  to  decide  where  fireline 
should  be  dug.  The  Fireboss  projects  the  spread  of  the  fire 
based  on  prevailing  weather  conditions,  then  considers  the 
number  of  available  bulldozers  and  the  proximity  of 
natural  boundaries.  It  projects  a  bounding  polygon  of  fire¬ 
line  to  be  dug  and  assigns  segments  to  bulldozers  based  on 
a  periodically  updated  assessment  of  which  segments  will 
be  reached  by  the  ^reading  fire  soonest  Because  there  are 
usually  many  more  segments  than  bulldozers,  each 
bulldozer  digs  multiple  segments.  The  Fireboss  assigns 
segments  to  bulldozers  one  at  a  time,  then  waits  for  each 
bulldozer  to  report  that  it  has  completed  its  segment 
before  assigning  another.  This  ensures  that  segment 
assigrunent  incorporates  the  most  up-to-date  information 
about  overall  progress  and  changes  in  the  prevailing  condi¬ 
tions. 

Once  a  plan  is  set  into  motion,  any  number  of  problems 
might  arise  that  require  the  Fireboss's  intervention.  The 
types  of  problems  and  mechanisms  for  handling  them  are 
described  in  Howe  &  Cohen  1990,  but  one  is  of  particular 
interest  here:  As  bulldozers  build  firelirtc,  the  Fireboss 
compares  their  progress  to  expected  progress.^  If  their 
actual  progress  falls  too  far  below  expectations,  a  plan 
failure  occurs,  and  (under  the  experiment  scenario  described 
here)  a  new  plan  is  generated.  The  new  plan  uses  the 
same  bulldozers  to  fight  the  fire  and  exploits  any  firclinc 
that  has  already  been  dug.  We  call  this  error  recovery 
method  replanning.  Phoenix  is  built  to  be  an  adaptable 
planning  s>'stem  that  can  recover  from  plan  failures  (Howe 
&  Cohen  1990).  Although  it  has  many  failure-recovery 
methods,  replanning  is  the  focus  of  the  experiment 
described  in  the  next  section. 

3  IDENTIFYING  THE  FACTORS 
THAT  AFFECT  PERFORMANCE 


-  KtptujtJ^Tnj  «tx>ui  f>n>n(cii  irc  tiurra  jn  tnvtlnpis.  I4ivcl>v«i 
rrprtKnl  i.’vc  rxn^c  of  the  knov^led,(C  uwj 

u>  Ux  pt&n.  If  »cujaJ  rfT>[ircii  fiili  (xjiiule  ihii  rinne.  and 

cnvplope  *ioi.3iion  occuri,  involiinfj  rtt'tr  rpc(/vpry  mrxhiniimi 
(GVicn,  r.in  A  -Sr  Amtnt  \'r)2,  Aiidr;«jn  A  iStit-n  ) /A)), 


We  designed  an  experiment  with  two  purposes.  A  con- 
firmalory  purpose  was  to  test  predictions  that  the  planner's 
performance  is  sensitive  to  some  environmental  con¬ 
ditions  but  not  others.^  In  particular,  wc  expected  perfor- 
nnance  to  degrade  when  we  change  a  fundamental  relation¬ 
ship  between  the  planner  and  its  environment — the 
anaount  of  time  the  plaimer  is  allowed  to  think  relative  to 
the  rate  at  which  the  environment  changes — and  not  be 
sensitive  to  common  dynamics  in  the  environment  such 
as  weather,  and  particularly,  wind  speed.  We  tested  two 
specific  predictions:  1)  that  performance  would  not  degrade 
or  would  degrade  gracefully  as  wind  speed  increased;  and  2) 
that  the  planner  would  not  be  robust  to  changes  in  the 
Fireboss's  thinking  speed  due  to  a  bottleneck  problem 
described  below.  An  exploratory  purpose  of  the  experi¬ 
ment  was  to  identify  the  factors  in  the  Fireboss  architec¬ 
ture  and  Phoenix  environment  that  most  affected  the  plan¬ 
ner's  behavior,  leading  to  the  causal  model  developed  in 
Section  5. 

The  Fireboss  must  select  plans,  instantiate  them,  dispatch 
agents  and  monitor  their  progress,  and  respond  to  plan 
failures  as  the  fire  bums.  The  rate  at  which  the  Fireboss 
thinks  is  determined  by  a  parameter  called  the  Real  Time 
Knob.  By  adjusting  the  Real  Time  Knob  we  allow  more 
or  less  simulation  time  to  elapse  per  unit  CPU  time, 
effectively  adjusting  the  speed  at  which  the  Fireboss 
thinks  relative  to  the  rate  at  which  the  environment 
changes. 

The  Fireboss  services  bulldozer  requests  for  assignments, 
providing  each  bulldozer  with  a  task  directive  for  each  new 
fircline  segment  it  builds.  The  Fireboss  can  become  a 
bottleneck  when  the  arrival  rate  of  bulldozer  task  requests 
is  high  or  when  its  thinking  speed  is  slowed  by  adjusting 
the  Real  Time  Knob.  This  bottleneck  sometimes  causes 
the  overall  digging  rale  to  fall  below  that  required  to  com¬ 
plete  the  firclinc  polygon  before  Lb"  fire,  reaches  it,  which 
causes  replanning  (sec  Section  2).  In  the  worst  ca^,  a 
Fireboss  bottleneck  can  cause  a  thrashing  effect  in  which 
plan  failures  occur  repeatedly  because  the  Fireboss  can't 
assign  bulldozers  during  replanning  fast  er>ough  U)  keep 
the  overall  digging  rate  at  effccuve  levels.  Wc  designed 
our  experiment  to  explore  the  clfects  of  this  botiioivetk  on 
system  pcrftxmancc  and  to  confirm  our  paxhem*!  that  [xa-- 
fomiancc  would  vary  in  projXkriKvi  to  the  manipulatiof)  of 
tliinking  sixcd.  llecau-sc  tlic  cunent  design  ol  tie  lireNivs 
is  not  sensitive  to  ctunges  in  thinking  sjx-ed.  wc  cuvci  it 
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to  take  longer  to  fight  fires  and  to  fail  more  often  to 
contain  them  as  thinking  speed  slows. 

In  contrast,  we  expect  Phoenix  to  be  able  to  fight  fires  at 
different  wind  speeds.  It  might  take  longer  and  sacrifice 
more  area  burned  at  high  wind  speeds,  but  we  expect  this 
effect  to  be  proportional  as  wind  speed  increases  ai>d  we 
expect  Phoenix  to  succeed  equally  often  at  a  range  of  wind 
speeds,  since  it  was  designed  to  do  so. 

3.1  EXPERIMENT  DESIGN 
We  created  a  straightforward  fire  fighting  scenario  that 
controlled  for  many  of  the  variables  known  to  affeci  the 
planner’s  performance.  In  each  trial,  one  fire  of  a  known 
initial  size  was  set  at  the  same  location  (an  area  with  no 
natural  boundaries)  at  the  same  time  (relative  to  the  start 
of  the  simulation).  Four  bulldozers  were  used  to  fight  iu 
The  wind’s  speed  and  direction  were  set  initially  and  not 
varied  during  the  triaL  Thus,  in  each  trial,  the  Fireboss 
receives  the  same  fire  report,  chooses  a  fire-fighting  plan, 
and  dispatches  the  bulldozers  to  implement  it.  A  trial 
ends  when  the  bulldozers  have  successfully  surrounded  the 
fire  or  alter  120  hours  without  success. 

The  experiment's  first  dependent  variable  then  is  Success, 
which  is  true  if  the  fire  is  contained,  and  false  otherwise. 
A  second  dependent  variable  is  shutdown  time  (SD),  the 
lime  at  w  hich  the  trial  was  stopped.  For  successful  trials, 
shutdown  time  tells  us  how  long  it  look  to  contain  the 
firc.‘‘ 

Two  independent  variables  were  wind  speed  (Ws)  and  the 
setting  of  the  Fireboss's  Real  Time  Knob  (RTK).  A  third 
variable,  the  first  plan  chosen  by  the  Fireboss  in  a  trial 
(FPLAN),  varied  randomly  between  trials.  It  was  not 
expected  to  influeiKC  performance,  but  because  it  did.  we 
treat  it  here  as  an  indepexKlcru  variable. 

WS:  The  sellings  of  WS  in  the  experiment  were  3. 6.  and 
9  kilometers  per  hour.  As  wind  speed  increases,  fire 
spreads  more  quickly  in  all  directions,  and  most  quickly 
downwind.  The  Fireboss  compensates  for  higher  values 
of  wind  speed  by  directing  bulldozers  to  build  firclinc  fur¬ 
ther  from  the  fire. 

RTK:  The  default  setting  of  RTK  for  Phoenix  agents  al¬ 
lows  them  to  execute  1  CPU  second  of  Lisji  ctxlc  for  every 
5  minutes  Uut  elapses  in  the  simulation.  Wc  varied  the 
Fireboss's  RTK  setting  in  different  trials  (Ic.ivitig  the  set- 
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tings  for  all  other  agents  at  the  default).  Wc  started  at  a 
ratio  of  1  simulalion-minuie/cpu  second,  a  thinking  speed 
5  times  as  fast  as  the  default,  and  varied  the  setting  over 
values  of  1.  3.  5,  7,  9.  11.  and  15  simulaiion- 
minuies/cpu-second.  These  values  range  from  5  times  the 
normal  speed  at  a  setting  of  1  down  to  one-third  the 
normal  speed  at  15.  The  values  of  RTK  reported  here  are 
rescaled.  The  normal  thinking  speed  (5)  has  been  set  to 
RTK=1.  and  the  other  settings  arc  relative  to  normal.  The 
scaled  values  fin  order  of  increasing  thinking  speed  )  are 
.33.  .45.  .56,  .71,  I.  1.67.  and  5.  RTK  was  set  at  the 
start  of  each  trial  and  held  constant  throughout. 

FPLAN:  The  Fireboss  randomly  selects  one  of  three 
plans  as  its  first  plan  in  each  trial.  The  plans  differ 
mainly  in  the  way  they  project  fire  spread  and  decide  where 
to  dig  fircline.  SHELL  is  aggressive,  assuming  an 
optimistic  combination  of  low  fire  spread  and  fast 
progress  on  the  pan  of  bulldozers.  MODEL  is  conservative 
in  its  expectations,  assuming  a  high  rate  of  spread  and  a 
lower  rale  of  progress.  The  third,  .MBIA,  generally  makes 
an  assessment  intermediate  with  respect  to  the  others.^ 
When  replanning  is  necessary,  the  Fireboss  again  chooses 
randomly  from  among  the  same  three  plans.* 

We  adopted  a  basic  factorial  design,  svstematically  varj'ing 
the  values  of  W’S  and  RTK.  Because  wc  had  rwt  andcipaied 
a  significant  effect  of  FPLAN,  wc  allowed  it  to  vary 
randomly. 

4  RESULTS  FOR  SUCCESS  RATE 
AND  SHUTDOWN  TIME 

Wc  collected  data  for  343  trials,  of  which  215  succeeded 
and  128  failed,  for  an  overall  success  rate  of  63%.  Tables 
la-c  break  down  successes  and  failures  for  each  setting  of 
the  independent  variables  RTK,  ws,  and  FPLAN.  Column 

5  in  these  tables  is  the  number  of  Successes,  F  is  the 
number  of  Failures,  and  Tot  is  the  total  number  of  tnals. 
Certain  trends  emerge  in  these  data  that  confirm  our  earlier 
predictions.  For  example,  in  Table  la,  the  success  rate 
improves  steadily  as  the  thinking  speed  of  the  Fueboss 
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Figure  1:  Successes  by  «)  Real  Time  Knob,  b)  Wind  Speed,  and  c)  Fim  Plan  Tried 


increases.  However,  other  paitcms  are  less  clear,  such  as 
the  differences  for  each  selling  of  WS  in  Table  lb.  How 
do  we  know  if  these  values  are  significantly  different?  For 
a  categorical  dependent  variable  such  as  Success  (which 
has  only  two  possible  values),  a  chi-square  test  (X^)  will 
determine  whether  the  observed  pattern  is  statistically 
significant 

Figures  la-c  show  the  success  rates  for  each  setting  of 
each  independent  variable.  The  table  categories  Success 
and  Failure  arc  broken  down  further  into  those  trials  which 
did  not  replan  and  those  that  did. 

4.1  EFFECT  OF  INDEPENDENT 

VARIABLES  ON  SUCCESS 

Table  la  shows  successes  by  the  independent  variable 
RTK.  A  chi-square  lest  on  the  Success-Failure  x  RTK  con¬ 
tingency  table  in  Table  la  is  highly  significant  (t’(6)  = 
49.081,  p  <  0.001),  indicating  lliat  RTK  suongiy  influ¬ 
ences  the  relative  frequency  of  successes  and  failures.  At 
llic  fastest  thinking  s|x:ed  for  the  I'ircbo.ss,  KTK=5,  the 
success  rate  is  98%,  but  ;a  liic  slowest  rate,  RTK=.33,  the 
success  rate  is  only  33%.  Figure  la  shows  graphically 


that  as  RTK  goes  dovrii  (Le.,  thinking  speed  decreases)  the 
success  rate  declines.  At  RTK=  1 .  the  default  setting,  63% 
of  the  trials  were  successful.  Note  how  rapidly  the  suc¬ 
cess  of  the  imiial  plan  decreases — for  RTK  <  .45.  no  tnal 
succeeds  without  replanning.  However,  the  overall 
success  rale  declines  more  slowly  as  replanning  is  used  to 
recover  from  the  bottleneck  effect  described  in  Section  3. 
If  wc  compare  the  rate  of  success  without  replanning  to 
that  with  replanning  in  Figure  la,  ice  see  that  replanning 
buffers  the  Phoenix  planner,  allowing  it  to  absorb  the 
effect  of  changes  in  Fireboss  RTK  without  failing.  This 
effect  is  siaiistically  highly  agruficarL 

Table  la:  TriiU  Pmniontd  by  Real  Tune  Knob. 
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Table  lb  shows  successes  by  wind  speed.  The  small 
differences  in  success  are  marginal  (X^(2)  =  5.354,  p  < 
0.069),  as  we  predicted  in  Section  3.  Figure  lb  shows  a 
curious  trend — as  WS  increases,  the  success  rate  for  the 
First  plan  goes  up,  while  the  success  rate  in  trials 
involving  replanning  diminishes.  Hie  increase  in  success 
rate  for  the  first  plan  occurs  because  as  WS  increases, 
Phoenix  overestimates  the  growth  of  the  fire  and  plans  a 
more  conservative  containing  fueline. 

Tabic  lb:  Trials  Partitioned  by  Wind  Speed. 
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Tabic  Ic  shows  successes  by  fii^st  plan  tried.  Differences 
in  success  are  highly  significant  (x\2)  =  16.183,  p  < 
0.001),  which  we  had  not  expected  when  designing  the 
experimenL  As  shown  in  Figure  Ic,  SHELL  has  a  very 
low  success  rate  without  replanning,  reflcoting  its 
aggressive  character,  while  the  conservative  MODEt.  has  an 
initial  success  rate  of  65%.  MBlA's  initial  success  nie  is 
slightly  better  than  SHELL’S  (though  the  difference  is  not 
statistically  significant). 

Table  Ic:  Trials  Partitioned  by  First  Plan  Tried. 
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4.2  EFFECT  OF  RTK  ON  SHUTDOWN  TIME 
Figure  2  shows  the  effect  of  RTK  on  the  dependent  vari¬ 
able  Shutdown  time  (SD).  The  interesting  aspect  of  this 
behavior  is  the  transition  at  RTK=1.  SD  increases  gradu¬ 
ally  between  RTK=5  and  1,  and  the  95%  confidence  inter¬ 
vals  around  the  mean  values  overlap.  Below  1,  however, 
the  slope  changes  markedly  and  the  confidence  intervals 
are  almost  disjoint  from  those  for  values  above  1.  This 
shift  in  slope  and  value  range  for  SD  suggests  a  threshold 
effect  in  Phoenix  as  the  Fireboss's  thinking  speed  is 
reduced  below  the  normal  setting  of  RTK.  The  cost  of 
resources  in  Phoenix  is  proportional  to  the  tim'*  spent 
fighting  fires,  so  a  threshold  clfcci  such  as  Uiis  represents 
a  significant  discontinuity  in  the  cost  function  for 
resources  a.scd.  For  tliis  reason  we  pursued  the  causefs)  of 
this  discontinuity  by  inodehiig  tlic  effects  of  tlic  indepen¬ 


dent  variables  on  several  key  endogenous  variables,^  and 
through  them  on  SO,  with  the  intent  of  building  a  causal 
model  of  the  influences  on  SD. 

5  INFLUENCE  OF  ENDOGENOUS 
VARIABLES  ON  SHUTDOWN  TIME 

We  measured  about  40  endogenous  variables  in  the  exper¬ 
iment  described  above,  but  three  are  of  particular  interest 
in  this  analysis:  the  amount  of  fircline  built  by  the  bull¬ 
dozers  (FB),  the  number  of  fire-fighting  plans  tried  by  the 
Fireboss  for  a  given  trial  (#PLANS),  and  the  overall  uti¬ 
lization  of  the  Fireboss's  thinking  resources  (OVUT). 


Figure  2:  Mean  Shuulown  Time  (in  Hours)  by  Real  Time 
Knob.  Error  Bars  Show  95%  Confidence  Intervals. 

FB:  The  value  of  this  variable  is  the  amount  of  fircline 
actually  built  at  the  end  of  the  uial.  FB  sets  a  lower  limit 
on  SD,  because  bulldozers  have  a  maximum  r^a  at  which 
they  can  dig.  Thus,  when  the  Fireboss  is  thinking  at  the 
fastest  speed  and  scrvici.ng  bulldozers  with  little  wait  time, 
SD  will  be  primarily  determined  by  how  much  fircline 
must  be  builL 

#PLANS:  N\Ticn  a  trial  ran  to  completion  without  replan¬ 
ning,  (ilT.ANS  was  set  to  1.  Foch  time  the  Fireboss  re¬ 
planned,  KPLANS  was  irKrcmcnicd.  »Pl.A.SS  is  .ui  impor¬ 
tant  indicator  of  the  level  of  difficulty  the  pl.inncr  has 
fighting  a  panicular  fire.  It  also  dirtxily  affects  FB.  As 
dcscrilvd  in  Section  2,  rephinning  involves  [Hoieciing  a 
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new  polygon  for  ihe  bulldozers  to  dig.  Typically  the  new 
polygon  is  larger  than  the  previous  one.  because  the  fire 
has  now  spread  to  a  point  where  the  old  one  is  too  close 
to  the  fire.  Thus,  the  amount  of  fireline  to  be  dug  tends 
to  increase  with  the  number  of  replanning  episodes. 

OVUT:  This  variable,  overall  utilization,  is  the  ratio  of 
the  time  the  Fneboss  spends  thinking  to  the  total  duration 
of  a  trial.  Thinking  activities  include  monitoring  the 
environment  artd  agents'  activities,  deciding  where  fueline 
should  be  dug,  and  coixdinating  agents'  tasks  (Cohen  et  al. 
1989).  The  Fireboss  is  sometimes  idle,  having  done 
everything  on  its  agenda,  and  so  it  waits  until  a  message 
arrives  from  a  field  agent  or  enough  time  passes  that 
another  action  becomes  eligible.  We  expected  to  sec 
OVUT  increase  as  RTK  decreases;  that  is,  as  the  Fireboss's 
thinking  speed  slows  down,  it  requires  a  greater  and  greater 
proportion  of  the  time  available  to  do  the  cognitive  work 
required  by  the  scenario.  Replanning  only  adds  to  the 
Fireboss's  cogmtivc  workload. 


Table  2a:  Regression  For  Y:  SD  on  X's;  WS.  RTK,  FPLAN, 
OVUT.  «PLANS.  FB 
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Table  2b:  Correlation  Coefficients 
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5.1  REGRESSION  ANALYSIS 
Having  identified  these  variables,  we  set  about  quantifying 
their  effects  using  multiple  regression.*  We  regressed  SD 
on  WS.  RTK,  FPLAN.  0\TJT.  #PLANS  and  FB.  These  fac¬ 
tors  accourued  far  76%  of  the  variance  in  SD.  Standardized 
beta  coefficients  are  often  cited  as  measures  of  the  relative 
influence  of  factors;  in  Table  2a  they  tell  us  that  FB  has 
the  largest  influence  on  SD  (beta  =  .759) .  with  RTK  and 
OVUT  following  close  behind.  But  if  the  beta's  represent 
the  strength  of  influence,  they  are  surprising.  OVUT  has  a 
negative  influence  on  SD,  which  is  counterintuitive  and 
appears  to  conuadki  the  positive  correlation  (.42)  between 
them  in  Table  2b.  WS  aiKi  KPLANS  have  virtually  no 
influence  on  SD,  even  though  #PLANS  is  strongly 
correlaied  with  SD  (.718).  And  although  WS  is  essentially 
uncorrelaicd  with  SD  (-.053),  it  is  correlated  with  FB 
(.363),  which  in  turn  is  strongly  corrclared  with  SD 
(.755).  Finally,  WS  and  RTK  arc  correlated  in  Tabic  2b 
(.282),  which  seems  impossible  given  that  they  were 
varied  sysiemaucally.  In  short,  the  regression  analysis 
arKl  the  conelatioo  matrix  contain  countcnnluiiivc  entries. 
We  will  sec  this  is  because  regression  is  ba.scd  on  an 
implicit  model,  one  that  almost  certainly  docs  not 
correspond  to  •-*:  structure  of  Itiocnix. 


r-t-tir.on  hujidi  >  lirfjr  m.Klel  ihc  tffc'li  U  «n' 
nuniixt  lA  X  t  vinirje  Y.  v.hiUi  in  Uni 

ii  sp.  U  1  ill  t  r.  lo  in*  1:11a  m  m  n  d.rT,rniiiiru]  fp«c-  uini] 

Ihe  le4ii  KiOJ.ti  ►nere  n  =  iXe  nun, ter  ,j(  X  vinibln  t  1 


5.2  PATH  ANALYSIS 

A  technique  called  path  analysis  (Asher  1983,  Li  1975) 
lets  us  view  corrclaiion  coefficients  of  the  vanabics  in 
Table  2b  as  sums  of  hj-poihesizcd  infiucnces  among  fac¬ 
tors.  Consider  the  surprising  result  that  wand  speed  (WS) 
is  csscniially  uncorrelated  with  shut-down  time  (SD).  We 
expected  WS  to  have  two  possible  effects  on  SD: 

Effect  1.  If  WS  increases  then  the  fire  bums  faster, 
aixl  this  means  more  fireline  must  be  built  (i.c.,  FB 
increases),  which  will  take  longer.  Therefore 
increasing  ws  should  increase  SD. 

Effect  2.  For  high  wind  speeds,  if  a  fire  isn't  con¬ 
tained  relatively  quickly,  then  it  might  not  be  con¬ 
tained  al  all.  For  example,  if  a  fire  has  been 
burning  for  60  hours  or  more,  and  ws  =  3.  then  the 
probability  of  the  fire  being  evc'nually  contained  is 
.375.  But  if  ws  =  6.  the  probabi.iiy  of  eventually 
containing  an  old  fire  Ls  only  .2,  and  if  WS  =  9,  ihc 
probability  drops  lo  .13.  We  measured  SD  for  suc¬ 
cessful  uials  only,  beeansc.  by  dcrmiiion.  an 
unsuccessful  trial  is  one  that  cxovnls  a  specified  SD 
without  containing  the  fires.  But  successful  con 
tainmcni  of  old  fires  is  rclauvely  unlikely  at  higher 
w  ind  speeds,  so  as  WS  increases,  we  see  lew rr  isider 
fires  contained,  thus  fewer  high  values  of  St).  This 
leads  us  to  expect  a  negative  torieUuon  tviween 
W'S  and  SD.  Note  that  tins  corrcLuioo  ret'ce'cnis  an 
effect  of  missing  data,  not  a  Uir-  negau^e  causal 
relationship  Ivtwecn  ws  arel  SI) 


One  of  ihc  mciiurc*  pctwlucCvj  \y\  mulii[‘ic  .•  K u 

(he  *.■<  '•tnkjh.c  (1.4  fvy  .jv*; 
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Path  aaalysis  enables  us  to  test  a  tnodei  in  which  the  cor¬ 
relation  rjfjso  is  composed  of  Effect  1  and  Effect  2, 
which  cancel  each  other  out  Consider,  for  example,  the 
path  diagram  in  Figure  3.  It  shows  WS  positively  influ¬ 
encing  the  amount  of  fireline  that  gets  built  (FB).  and  FB 
positively  influencing  SD  (we  will  shortly  describe  how 
the  numbers  are  derived).  This  path.  WS->FB-»SD.  corre¬ 
sponds  to  Effect  1 ,  above,  and  is  called  an  indirect  effect  of 
WS  on  SD,  mediated  by  FB.  At  the  same  time.  WS  di¬ 
rectly  and  negatively  influences  SD  on  the  path  WS-^SD, 
corresponding  to  Effect  2.  Figure  3  shows  the  strength  of 
WS->SD  is  -311.  The  rules  of  path  analysis  dictate  that 
the  strength  of  WS-»FB->SD  is  the  product  of  the 
strengths  of  the  constituent  links,  WS-^FB  and  FB->SD, 
that  is,  (.363)(.892)  =  .328.  The  estimate  of  the 
correlation  lietween  WS  and  SD,  ^  obtained  by 

sumn’L".g  the  direct  and  indirect  effects,  that  is,  .328  - 
3H  =  -0.53.  This  is  the  sum  of  all  legal  ways  for  WS  to 
influence  SD  given  the  structure  in  Figure  3.  For  the 
model  in  Figure  3,  r^sso  =  '’wssd-  but  this  doesn't 
happen  in  general. 

Thus  we  decompose  the  correlation  into  two  addi¬ 
tive  effects:  WS  increases  FB  as  expected  and  decreases  SD 
(spuriously,  as  noted  above)  as  expected,  and  these  effects 
cancel 

F*aih  analysts  involves  three  steps: 

1)  Propose  a  path  model  (such  as  the  one  in  Figure 
3).  The  model  represents  causal  influences  with 
directed  arrows  (c.g..  FB->SD)  and  correlations  with 
undirected  links  (sec  Figure  4a). 

2)  Derive  path  coefficients  (such  as  -.377,  .363  and 
.892).  The  magnitude  of  a  path  coefflcieni  is  inter¬ 
preted  as  a  measure  of  causal  influence. 

3)  Estimate  the  strength  of  the  relationship  between 
two  factors  (such  as  WS  and  SD)  by  muluplymg 
path  coefficients  along  paths  between  the  factors 
and  summing  the  products  over  all  legal  paths 
betw  een  the  factors. 

Step  3  is  entirely  algorithmic  given  some  simple  rules 
(described  below)  that  define  legal  paths.  Step  2  involves 
some  judgment  because  some  models  allow  multiple  ways 
to  denve  one  or  more  piaih  cocfncicnts.  A  model  is  a  con¬ 
cise  siatcmcni  of  hypothesized  causal  influences  among 
factors,  and  the  space  of  models  grows  c;>mbinauxially 
with  the  number  of  factors,  so  step  1  ,;ro{X)sing  a  model. 


WS 

.363 

^  SO 

'  - - "^2 

FB 

r  v377  ^ 

wsso 

-  (.363){.892)  =  -.053 

Fgure  3:  A  Simple  Path  Diagram  Showing  Three 
Variables  and  Their  Influences. 

is  apt  to  benefit  from  knowledge  about  the  system  we  arc 
modeling.^ 

All  three  steps  will  be  clearer  if  we  briefly  describe  the 
relationship  between  multiple  linear  regression  and  oath 
analysis.  They  arc  basically  the  same  thing:  both  denve 
path  coefficients  for  a  model.  The  difference  is  simply  that 
one  particular  model  is  implicit  in  multiple  regression. 
Consider  an  elaboration  of  Figure  3,  in  which  wc  add  the 
RTK  as  an  additional  causal  influence  on  SD.  Figure  4a 
shows  the  implicit  model  fit  by  multiple  regression,  and 
Figure  4b  shows  a  model  that  wc  think  is  a  beocr  repre¬ 
sentation  of  what  is  actually  going  on  in  Phoenix. 

The  regression  model  assumes  that  all  predicior  variables 
(WS,  FB,  RTK)  are  corrclaicd,  and  assumes  all  duccUy 
influence  the  criterion  variable  (SD).  Correlated  variables 
are  linked  by  undirected  paths,  which  are  labeled  wiih  the 
correlations.  Table  2b  pre.scnts  the  correlation  matrix 
derived  from  our  experiment  Multiple  regression  gener¬ 
ates  standard  partial  regression  (beta)  coclTicicnis  for  each 
direct  path  between  the  predictor  and  criterion  vanables. 
These  arc 1,  .81  and -.2  in  Figure  4a.  Each  represents 
a  standardized  measure  of  the  influcrKC  of  me  predictor 
variable  on  the  criterion  variable  wnh  the  effects  of  the 
other  predictor  variables  held  consianL  The  resulting 

regression  equation  in  standard  format  is  sb  =  .81  FB  - 
.29  WS  -  .2  RTK.  Because  the  regression  coefficienu  are 
Standardized  they  can  he  compared;  a  unit  change  in  FB 
produces  .81  units  change  in  SD,  where.is  a  unit  c.hange  in 
WS  produces  -.29  units  change  in  SD.  t  H  is  ihc  sucmgei 
influence. 

Figure  4a  represents  a  ilecom;  '^Ujho  uI  :.V  (..-rrr.jiu^s^ 
Irtwccn  SD  and  llic  (itlicr  v.uLibU  s,  'IV  wJfrrLu*  >(;s .  an 
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Figure  4;  A  Showi  the  Path  Model  Implicit  in  Multiple  Regression.  The  Path  Model  in  B 
Better  Captures  the  Relationships  Among  These  Variables  in  Phocruji. 


be  reconstituted  by  summing  ibe  influences  along  paths 
just  as  we  did  in  Figure  3.  Path  analysis  has  three  rules 
for  identifying  paths: 

1)  No  more  than  one  undirected  link  can  be  pan  of  a 
path  (e.g.,  FB-^RTK->SD  is  legal,  but 
WS->FB-»RTK-^SD  isn’t) 

2)  A  path  cannot  go  through  a  node  twice. 

3)  A  path  can  go  backward  on  a  directed  link,  but  not 
after  it  has  gone  forward  on  another  link  (e.g. 
FB<-WS-»SD  in  Figure  4b  is  legal  but 
#PLANS->FB^-WS  in  Figure  5  isn'l). 

The  strength  of  each  multilink  path  is  just  the  product  of 
its  constituent  coefficients,  so  the  strength  of  the  paih 
FB-»RTK-»SD  in  Figure  4a  is  (-.249)(-.2)  =  .0498.  The 
estimated  correlation  befwecn  a  predictor  and  a  criterion 
variable  is  the  sum  of  the  strengths  of  the  paths  that  con¬ 
nect  them.  Thus 

TfBSD  ”*255  =  .81  direct  FB-iSD  path 

+  ( .  363)  (- .  291)  FB-ns-^ 
+  (-.249)  (-.2)  FR-RTK-CD 

So  multiple  regressioo  follows  the  three  steps  of  path 
analysis.  First,  propose  a  model,  specifically,  a  model  in 

which  all  predictor  variables  are  correlated  and  directly 
linked  to  the  criienon.  SecofKl,  estimate  path  coefficients, 
specifically,  calculate  standard  partial  regression  coeffi¬ 
cients  for  ihc  direct  paths  between  predictor  and  criienon 
variables,  and  label  die  uixlinxted  links  with  the  appropri¬ 
ate  correlations.  Third,  estimate  the  correlations  between 
each  predictor  and  criterion  variable  by  identifying  legal 
piuhs  between  them.  cakuLiting  tlic  slrengdi  of  each  path, 
and  summing  th.c  path  strcngdis.  In  multiple  regression, 
die  cstjmaied  corrcl<itKjns  arc  always  identical  to  the  acitul 
corrclaiior.s. 


Multiple  regression  is  a  fine  way  to  decompose  corrrla- 
tions  into  their  component  influences  if  you  believe  iha: 
multiple  regression's  implicit  causal  model  represents 
your  system.  Multiple  regression  is  just  path  analysis  on 
this  implicit  model,  so  if  you  don  t  believe  the  model  you 
can  propose  another  and  run  paih  analysis  on  iL  This  is 
what  we  did  in  Figure  4b.  We  know  that  WS  and  RTK  arc 
independent  because  our  experiment  varied  them  mdepen- 
dcntly  in  a  factorial  design.  (The  reason  they  are  correlated 
is  the  sampling  bias  identified  as  effect  2,  above.)  So  we 
want  to  test  a  model  in  which  WS  influences  SD  directly 
and  through  FB,  and  RTK  influences  SD  directly.  The  only 
question  is  how  to  esiimaic  the  path  coefficicms.  The 
basic  rules,  w  hich  yield  the  ccxtfficienis  in  F'lguie  4b.  are; 

1)  If  v/  and  .X  arc  uiKOirelaicd  causes  of  the  critenoo 
variable  Y.  then  the  piaih  coefficients  pyx  3nd  p->-w 
are  just  the  correlation  coefficients  r^x  ^nd  ryw. 
rcspecuvcly. 

2)  If  W'  and  X  are  correlated  causes  of  the  criicrkxi 
variaMc  Y,  then  the  path  cocfficienis  pyy  and  pv^w 
arc  the  standard  partial  regress  kxi  cocfficienis  b'yy . 
w  arid  bVw  .  X  .  rcspieeuvcly,  obtained  from  the 
regression  of  Y  on  X  and  W. 

Is  Figure  4b  a  bcocr  model  than  Figure  4a?  W'c  can 
answer  the  question  in  two  ways.  The  stausticai  answer 
is  that  no  model  fits  the  data  bctier,  lo  icrmi  of 
accounung  fir  variance  in  the  enunon  vanabie,  than  the 
regression  model.  But  this  is  hardly  surprising  wtvn  y>u 
consider  ihat  the  regression  nuxlcl  a.ssumes  esrrv.nmg 
inlluonccs  eserv thing  else.  The  svstern  undssij  answer 
is  i.hal  we  don  i  want  nuvlels  in  whK.t  evmi.^n.Tg 
intlticnccs  everything  else:  we  want  m<«JeU  ..n  wm, ,>i 
siHUC  links  arc  lelt  out,  ni  wtiiih  uuval  iniluen^f i  a/e 
Iivalu’cd.  ran  dLvsip,iicd  tlirongh  a  iKiwirx  ol  cn rrLaii>ns. 
[ei  s  ask,  'Ton,  wfut  il  mc.ms  sik  h  rrvodci  lo  .V 
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better  than  another.  Again,  the  judgment  (kpends  on  how 
well  each  accounts  for  the  variance  in  the  criterion  variable 
and  how  accurately  each  esiinutes  the  conelaiions  between 
variables,  and.  how  well  each  represents  what  we  surmise 
to  be  the  causal  structure  of  our  system.  Gcarly,  these 
criteria  interact.  We  can  imagine  a  model  that  fits  dte  data 
well  but  canrKH  re^xesent  what  we  know  to  be  the  causal 
stnicture,  but  often  we  explore  different  plausible  causal 
structures  by  seeing  bow  well  each  fits  the  data. 

The  structure  in  Figure  5  represents  one  of  our  first 
guesses  at  the  causal  structure  that  relates  WS,  FPLAN  ana 
RTK  to  SD.  We  expected  WS  and  FPLAN  to  each  directly 
influence  both  #PLANS  and  FB,  but  neither  to  directly 
influence  SD.  We  also  expected  RTK  to  influence  KPLANS 
and  SD  directly.  Wc  thought  #PLANS  might  influence  FB 
and  SD.  We  made  these  guesses  based  on  regression  anal¬ 
yses,  the  correlation  matrix  in  Table  2b,  some  of  the 
graphs  shown  earlier,  and  our  general  knowledge  about 
how  the  Phoenix  planner  works. 

After  estimating  the  path  coefficients  as  shown  in  Figure 
5,  we  estimated  tj>e  correlations  between  SD  and  each 
variable  i.  The  estimates  and  the  actual  conelaiions  arc  as 
follows; 


WS 

FPIAS 

RTK 

KPUCSS 

m 

'’SOi 

.1  U 

-.197 

-.533 

.7  19 

.77* 

'’SDi 
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Plgurt  S:  P»ih  Mode!  Reining  Vtrisblejs  Infljencing 
Shutdown  Time, 

Except  for  the  disparity  between  the  cstimau-d  and  xtiial 
correlations  Ix-twccn  WS  and  .SD,  ;!as  ii.tdcl  .accounts 
pretty  well  ftir  the  .tciual  correlations.  At  thus  jvnnt,  wr 
wiinicd  to  cx[)!ain  die  iniiaeixe  of  K  l  K  on  tl’l  .*,NS.  Whv 
sliotild  docrcasiog  R  l  k  (slowing  the  l-ircli.i'vs's  tliinVmg 
speed)  incrc.a.se  i),c  nurnk-r  of  plans?  One  cxpLaiiJon  is 
sorncdiing  like  llira.siimg:  ilicre  is  always  i.k  povsibility 


that  the  environment  will  change  in  such  a  way  that  a 
plan  is  no  longer  appropriate,  but  this  is  much  more 
likely  when  the  environment  changes  rapidly  relative  to 
planning  effort  (i.c..  when  RTK  is  decreased).  Thus, 
decreasing  RTK  means  the  Fireboss  will  have  to  throw 
away  plans  before  they  make  much  progress,  resulting  in 
an  increase  in  ^PLAN'S.  To  test  this  we  introduced  arvKher 
variable,  OVUT,  which  measures  the  percentage  of  time  in 
a  trial  that  the  Fireboss  spends  planning.  Wc  expected 
OVUT  to  decrease  with  RTK,  supporting  the  thrashing 
explanation.  Figure  6  shows  a  modification  of  Figure  5, 
with  the  path  RTK->OVUT->#PLANS  instead  of 
RTK ->»  PLANS. 


Hgure  6:  Adding  ik  PrKi-'geTSOus  Vc-nible  OVLT. 

For  this  model,  esii.matcd  corrcLauons  between  SD  and  ail 
the  other  variables  arc  rex  appreciably  differed, t  tbcin  they 
were  for  the  model  in  Figure  5.  But  it  appears  that  a'lC 
variable  0\  LT  docs  rxx  add  much  to  our  understanding  of 
ihrash.mg,  because  it  is  completely  dcieimined  by  RTls. 
Consider  what  happens  when  we  derive  path  coeificients 
for  a  slightly  different  model  (.Figure  7).  In  d.is  ca.-e, 
OVUT  has  almost  no  infiucnce  ( -n i rtAvj  =  ■0.'2)  c-n 
IPLAS'S.  Recall,  however,  that  this  path  coeif.cient  is  Tie 
stanaardized  partial  regression  ccefficicnt 

^^tT  .rixs-.XTX  ^ 

with  RTK  held  constant.  The  tact  th,at  this  number  is 
nearly  zero  means  that  OVLT  bos  no  cifect  xi  tPLASS 
u,hen  RTK  IS  held  corruunr,  in  r-thcr  wtvds,  tk  e.Tect  of 
OVLT  on  »PU\SS  IS  doc  ermrely  lo  KTK. 
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Figure  7:  Showing  die  Effect  of  OVUT  on  #PLANS  is  Due 
Entirely  to  RTK. 

6  CONCLUSION 

We  have  presented  results  of  an  experiment  with  the 
Phoenix  plarmer  that  coofirm  our  predicuons  that  its  per¬ 
formance  would  be  sensitive  to  some  environmental 
conditions  but  not  others.  We  have  shown  that  the 
planner  is  not  sensitive  to  variation  in  initial  wind  speed, 
a  common  environmental  dynamic  it  faces.  On  the  other 
hand,  our  results  show  that  performance  degrades  as  we 
change  a  fundamental  relationship  between  the  planner  and 
its  environment-the  rate  at  which  the  Fireboss  agent 
thinks.  As  we  slowed  the  Fireboss's  thinking  speed  in  the 
experiment  by  decreasing  RTK,  performance  degraded  to 
the  point  where  no  plan  succeeded  on  the  first  try. 
However,  the  planner  was  still  able  to  succeed  in  many 
cases  by  replanning.  While  the  success  rate  using 
replanning  also  degrades,  replanning  acts  as  a  buffer, 
preventing  the  planner  from  failing  catastrophically  when 
it  can't  think  fast  enough  lo  keep  up  with  the 
envirofimenL  The  data  also  show  that  replanning  exerts  a 
large  influerKe  on  SD.  We  have  presented  a  causal  model, 
developed  using  path  analysis,  of  the  effects  on  SD  of  the 
various  independent  arui  endogenous  variables  we  mea¬ 
sured. 

Replanning  occurs  when  the  environment  doesn't  match 
the  Fireboss's  cxpectaiioos.  In  the  current  experiment,  the 
rate  at  which  the  expectations  became  invalid  was  set  by 
RTK.  But  the  effect  was  indirect:  Low  RTK  ensured  that 
the  Fireboss  would  be  swamped  (OVUT),  which  meant  that 
bulldozers  had  to  wait  for  instructions,  which,  in  turn, 
increased  the  probability  that  they  would  not  be  able  to 
carry  out  their  instructions  by  their  deadlirKS.  This  is  what 
caused  plans  to  fail.  Environmental  changes  were  only  the 
instrument  oi  the  problem;  RTK  iniiuicd  it.  But  cxpccu- 
uons,  and  thus  plans,  can  also  fail  if  the  environment 
itself  changes.  Wc  have  ycl  to  study  whether  replanning 
makes  Phoenix  robust  against  these  changes,  though  our 
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